Introduction To Computing
Introduction To Computing
Computing
Explorations in
Language, Logic, and Machines
David Evans
University of Virginia
For the latest version of this book and supplementary materials, visit:
http://computingbook.org
1 Computing 1
1.1 Processes, Procedures, and Computers . . . . . . . . . . . . . . . . 2
1.2 Measuring Computing Power . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Representing Data . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Growth of Computing Power . . . . . . . . . . . . . . . . . 12
1.3 Science, Engineering, and the Liberal Arts . . . . . . . . . . . . . . 13
1.4 Summary and Roadmap . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Language 19
2.1 Surface Forms and Meanings . . . . . . . . . . . . . . . . . . . . . 19
2.2 Language Construction . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Recursive Transition Networks . . . . . . . . . . . . . . . . . . . . . 22
2.4 Replacement Grammars . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Programming 35
3.1 Problems with Natural Languages . . . . . . . . . . . . . . . . . . . 36
3.2 Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.2 Application Expressions . . . . . . . . . . . . . . . . . . . . 41
3.5 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6.1 Making Procedures . . . . . . . . . . . . . . . . . . . . . . . 45
3.6.2 Substitution Model of Evaluation . . . . . . . . . . . . . . . 46
3.7 Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.8 Evaluation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Data 75
5.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.1 Making Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.2 Triples to Octuples . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 List Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4.1 Procedures that Examine Lists . . . . . . . . . . . . . . . . . 83
5.4.2 Generic Accumulators . . . . . . . . . . . . . . . . . . . . . 84
5.4.3 Procedures that Construct Lists . . . . . . . . . . . . . . . . 86
5.5 Lists of Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.6 Data Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.7 Summary of Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6 Machines 105
6.1 History of Computing Machines . . . . . . . . . . . . . . . . . . . . 106
6.2 Mechanizing Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.2.1 Implementing Logic . . . . . . . . . . . . . . . . . . . . . . 109
6.2.2 Composing Operations . . . . . . . . . . . . . . . . . . . . . 111
6.2.3 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.3 Modeling Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3.1 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . 118
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7 Cost 125
7.1 Empirical Measurements . . . . . . . . . . . . . . . . . . . . . . . . 125
7.2 Orders of Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2.1 Big O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.2.2 Omega . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2.3 Theta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3 Analyzing Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.3.1 Input Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.3.2 Running Time . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.3.3 Worst Case Input . . . . . . . . . . . . . . . . . . . . . . . . 138
7.4 Growth Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.4.1 No Growth: Constant Time . . . . . . . . . . . . . . . . . . 139
7.4.2 Linear Growth . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.4.3 Quadratic Growth . . . . . . . . . . . . . . . . . . . . . . . . 145
7.4.4 Exponential Growth . . . . . . . . . . . . . . . . . . . . . . . 147
7.4.5 Faster than Exponential Growth . . . . . . . . . . . . . . . . 149
7.4.6 Non-terminating Procedures . . . . . . . . . . . . . . . . . 149
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9 Mutation 179
9.1 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.2 Impact of Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2.1 Names, Places, Frames, and Environments . . . . . . . . . 182
9.2.2 Evaluation Rules with State . . . . . . . . . . . . . . . . . . 183
9.3 Mutable Pairs and Lists . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.4 Imperative Programming . . . . . . . . . . . . . . . . . . . . . . . . 188
9.4.1 List Mutators . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.4.2 Imperative Control Structures . . . . . . . . . . . . . . . . . 191
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
10 Objects 195
10.1 Packaging Procedures and State . . . . . . . . . . . . . . . . . . . . 196
10.1.1 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.1.2 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.1.3 Object Terminology . . . . . . . . . . . . . . . . . . . . . . . 199
10.2 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.2.1 Implementing Subclasses . . . . . . . . . . . . . . . . . . . 202
10.2.2 Overriding Methods . . . . . . . . . . . . . . . . . . . . . . 204
10.3 Object-Oriented Programming . . . . . . . . . . . . . . . . . . . . 207
10.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
11 Interpreters 211
11.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.1.1 Python Programs . . . . . . . . . . . . . . . . . . . . . . . . 213
11.1.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
11.1.3 Applications and Invocations . . . . . . . . . . . . . . . . . 219
11.1.4 Control Statements . . . . . . . . . . . . . . . . . . . . . . . 219
11.2 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
11.3 Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
11.3.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
11.3.2 If Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 225
11.3.3 Definitions and Names . . . . . . . . . . . . . . . . . . . . . 226
11.3.4 Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
11.3.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
11.3.6 Finishing the Interpreter . . . . . . . . . . . . . . . . . . . . 229
11.4 Lazy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11.4.1 Lazy Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . 230
11.4.2 Lazy Programming . . . . . . . . . . . . . . . . . . . . . . . 232
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
12 Computability 237
12.1 Mechanizing Reasoning . . . . . . . . . . . . . . . . . . . . . . . . 237
12.1.1 Godels Incompleteness Theorem . . . . . . . . . . . . . . . 240
12.2 The Halting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 241
12.3 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
12.4 Proving Non-Computability . . . . . . . . . . . . . . . . . . . . . . 245
12.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Indexes 253
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
List of Explorations
List of Figures
1.1 Using three bits to distinguish eight possible values. . . . . . . . . . . 6
This book started from the premise that Computer Science should be taught as
a liberal art, not an industrial skill. I had the privilege of taking 6.001 from Gerry
Sussman when I was a first year student at MIT, and that course awakened me
to the power and beauty of computing, and inspired me to pursue a career as
a teacher and researcher in Computer Science. When I arrived as a new faculty
member at the University of Virginia in 1999, I was distraught to discover that
the introductory computing courses focused on teaching industrial skills, and
with so much of the course time devoted to explaining the technical complexi-
ties of using bloated industrial languages like C++ and Java, there was very little,
if any, time left to get across the core intellectual ideas that are the essence of
computing and the reason everyone should learn it.
With the help of a University Teaching Fellowship and National Science Foun-
dation grants, I developed a new introductory computer science course, tar-
geted especially to students in the College of Arts & Sciences. This course was
first offered in Spring 2002, with the help of an extraordinary group of Assistant
Coaches. Because of some unreasonable assumptions in the first assignment,
half the students quickly dropped the course, but a small, intrepid, group of pi-
oneering students persisted, and it is thanks to their efforts that this book exists.
That course, and the next several offerings, used Abelson & Sussmans outstand-
ing Structure and Interpretation of Computer Programs (SICP) textbook along
with Douglas Hofstadters Godel, Escher, Bach: An Eternal Golden Braid.
I am not alone in thinking SICP is perhaps the greatest textbook ever written in
any field, so it was with much trepidation that I endeavored to develop a new
textbook. I hope the resulting book captures the spirit and fun of computing
exemplified by SICP, but better suited to an introductory course for students
with no previous background while covering many topics not included in SICP
such as languages, complexity analysis, objects, and computability. Although
this book is designed around a one semester introductory course, it should also
be suitable for self-study students and for people with substantial programming
experience but without similar computer science knowledge.
I am indebted to many people who helped develop this course and book. West-
ley Weimer was the first person to teach using something resembling this book,
and his thorough and insightful feedback led to improvements throughout. Greg
Humphreys, Paul Reynolds, and Mark Sherriff have also taught versions of this
course, and contributed to its development. I am thankful to all of the Assis-
tant Coaches over the years, especially Sarah Bergkuist (2004), Andrew Connors
(2004), Rachel Dada (2003), Paul DiOrio (2009), Kinga Dobolyi (2007), Jon Erd-
man (2002), Ethan Fast (2009), David Faulkner (2005), Jacques Fournier (2003),
Richard Hsu (2007), Rachel Lathbury (2009), Michael Lew (2009), Stephen Liang
(2002), Dan Marcus (2007), Rachel Rater (2009), Spencer Stockdale (2003), Dan
Upton (2005), Portman Wills (2002), Katie Winstanley (2003 and 2004), and Re-
becca Zapfel (2009). William Aiello, Anna Chefter, Chris Frost, Jonathan Grier,
Thad Hughes, Alan Kay, Tim Koogle, Jerry McGann, Gary McGraw, Radhika Nag-
pal, Shawn OHargan, Mike Peck, and Judith Shatin also made important con-
tributions to the class and book.
My deepest thanks are to my wife, Nora, who is a constant source of inspiration,
support, and wonder.
Finally, my thanks to all past, present, and future students who use this book,
without whom it would have no purpose.
Happy Computing!
David Evans
Charlottesville, Virginia
August 2011
Spring 2003
The first million years of hominid history produced tools to amplify, and later
mechanize, our physical abilities to enable us to move faster, reach higher, and
hit harder. We have developed tools that amplify physical force by the trillions
and increase the speeds at which we can travel by the thousands.
Tools that amplify intellectual abilities are much rarer. While some animals have
developed tools to amplify their physical abilities, only humans have developed
tools to substantially amplify our intellectual abilities and it is those advances
that have enabled humans to dominate the planet. The first key intellect am-
plifier was language. Language provided the ability to transmit our thoughts to
others, as well as to use our own minds more effectively. The next key intellect
amplifier was writing, which enabled the storage and transmission of thoughts
over time and distance.
Computing is the ultimate mental amplifiercomputers can mechanize any in-
tellectual activity we can imagine. Automatic computing radically changes how
humans solve problems, and even the kinds of problems we can imagine solv-
ing. Computing has changed the world more than any other invention of the
past hundred years, and has come to pervade nearly all human endeavors. Yet,
we are just at the beginning of the computing revolution; todays computing of-
fers just a glimpse of the potential impact of computing.
There are two reasons why everyone should study computing: It may be true that
you have to be able
1. Nearly all of the most exciting and important technologies, arts, and sci- to read in order to
ences of today and tomorrow are driven by computing. fill out forms at the
2. Understanding computing illuminates deep insights and questions into DMV, but thats not
the nature of our minds, our culture, and our universe. why we teach
children to read. We
Anyone who has submitted a query to Google, watched Toy Story, had LASIK teach them to read
for the higher
eye surgery, used a smartphone, seen a Cirque Du Soleil show, shopped with a purpose of allowing
credit card, or microwaved a pizza should be convinced of the first reason. None them access to
of these would be possible without the tremendous advances in computing over beautiful and
meaningful ideas.
the past half century. Paul Lockhart,
Lockharts Lament
Although this book will touch on on some exciting applications of computing,
our primary focus is on the second reason, which may seem more surprising.
2 1.1. Processes, Procedures, and Computers
Computing changes how we think about problems and how we understand the
world. The goal of this book is to teach you that new way of thinking.
Describing processes by just listing steps like this has many limitations. First,
natural languages are very imprecise and ambiguous. Following the steps cor-
rectly requires knowing lots of unstated assumptions. For example, step three
assumes the operator understands the difference between coffee grounds and
finished coffee, and can infer that this use of coffee refers to coffee grounds
since the end goal of this process is to make drinkable coffee. Other steps as-
sume the coffeemaker is plugged in and sitting on a flat surface.
One could, of course, add lots more details to our procedure and make the lan-
guage more precise than this. Even when a lot of effort is put into writing pre-
cisely and clearly, however, natural languages such as English are inherently am-
biguous. This is why the United States tax code is 3.4 million words long, but
lawyers can still spend years arguing over what it really means.
Another problem with this way of describing a procedure is that the size of the
Chapter 1. Computing 3
We defer considering the second property until Part II, but consider the first
question here.
1.2.1 Information
Informally, we use information to mean knowledge. But to understand informa- information
tion quantitatively, as something we can measure, we need a more precise way
to think about information.
4 1.2. Measuring Computing Power
The way computer scientists measure information is based on how what is known
changes as a result of obtaining the information. The primary unit of informa-
bit tion is a bit. One bit of information halves the amount of uncertainty. It is equiv-
alent to answering a yes or no question, where either answer is equally likely
beforehand. Before learning the answer, there were two possibilities; after learn-
ing the answer, there is one.
binary question We call a question with two possible answers a binary question. Since a bit can
have two possible values, we often represent the values as 0 and 1.
For example, suppose we perform a fair coin toss but do not reveal the result.
Half of the time, the coin will land heads, and the other half of the time the
coin will land tails. Without knowing any more information, our chances of
guessing the correct answer are 21 . One bit of information would be enough to
convey either heads or tails; we can use 0 to represent heads and 1 to rep-
resent tails. So, the amount of information in a coin toss is one bit.
Similarly, one bit can distinguish between the values 0 and 1:
Is it 1?
No Yes
0 1
6?
No Yes
5? 6
No Yes
4? 5
No Yes
3? 4
No Yes
2? 3
No Yes
1 2
Our goal is to identify questions where the yes and no answers are equally
likelythat way, each answer provides the most information possible. This is
not the case if we start with, Is the value 6?, since that answer is expected to be
yes only one time in six. Instead, we should start with a question like, Is the
value at least 4?. Here, we expect the answer to be yes one half of the time,
and the yes and no answers are equally likely. If the answer is yes, we know
the result is 4, 5, or 6. With two more bits, we can distinguish between these
three values (note that two bits is actually enough to distinguish among four
different values, so some information is wasted here). Similarly, if the answer
to the first question is no, we know the result is 1, 2, or 3. We need two more
bits to distinguish which of the three values it is. Thus, with three bits, we can
distinguish all six possible outcomes.
>= 4?
No Yes
3? 6?
No Yes No Yes
2? 3 5?
6
No Yes No Yes
1 2 4 5
Three bits can convey more information that just six possible outcomes, how-
ever. In the binary question tree, there are some questions where the answer
is not equally likely to be yes and no (for example, we expect the answer to
Is the value 3? to be yes only one out of three times). Hence, we are not
obtaining a full bit of information with each question.
Each bit doubles the number of possibilities we can distinguish, so with three
bits we can distinguish between 2 2 2 = 8 possibilities. In general, with n bits,
we can distinguish between 2n possibilities. Conversely, distinguishing among k
possible values requires log2 k bits. The logarithm is defined such that if a = bc logarithm
then logb a = c. Since each bit has two possibilities, we use the logarithm base
2 to determine the number of bits needed to distinguish among a set of distinct
possibilities. For our six-sided die, log2 6 2.58, so we need approximately 2.58
binary questions. But, questions are discrete: we cant ask 0.58 of a question, so
we need to use three binary questions.
possible values using the answers to the questions down the tree. For example,
if the answers are No, No, and No, we reach the leaf 0; if the answers are
Yes, No, Yes, we reach the leaf 5. Since there are no more than two possible
answers for each node, we call this a binary tree.
We can describe any non-negative integer using bits in this way, by just adding
additional levels to the tree. For example, if we wanted to distinguish between
16 possible numbers, we would add a new question, Is is >= 8? to the top
of the tree. If the answer is No, we use the tree in Figure 1.1 to distinguish
numbers between 0 and 7. If the answer is Yes, we use a tree similar to the one
in Figure 1.1, but add 8 to each of the numbers in the questions and the leaves.
depth The depth of a tree is the length of the longest path from the root to any leaf. The
example tree has depth three. A binary tree of depth d can distinguish up to 2d
different values.
>= 4?
No Yes
>= 2? >= 6?
No Yes No Yes
1? 3? 5? 7?
0 1 2 3 4 5 6 7
Units of Information. One byte is defined as eight bits. Hence, one byte of
information corresponds to eight binary questions, and can distinguish among
28 (256) different values. For larger amounts of information, we use metric pre-
fixes, but instead of scaling by factors of 1000 they scale by factors of 210 (1024).
Hence, one kilobyte is 1024 bytes; one megabyte is 220 (approximately one mil-
lion) bytes; one gigabyte is 230 (approximately one billion) bytes; and one ter-
abyte is 240 (approximately one trillion) bytes.
Exercise 1.1. Draw a binary tree with the minimum possible depth to:
a. Distinguish among the numbers 0, 1, 2, . . . , 15.
b. Distinguish among the 12 months of the year.
Chapter 1. Computing 7
Exercise 1.3. The examples all use binary questions for which there are two
possible answers. Suppose instead of basing our decisions on bits, we based it
on trits where one trit can distinguish between three equally likely values. For
each trit, we can ask a ternary question (a question with three possible answers).
a. How many trits are needed to distinguish among eight possible values? (A
convincing answer would show a ternary tree with the questions and answers
for each node, and argue why it is not possible to distinguish all the values
with a tree of lesser depth.)
b. [?] Devise a general formula for converting between bits and trits. How many
trits does it require to describe b bits of information?
The guess-a-number game starts with one player (the chooser) picking a num-
ber between 1 and 100 (inclusive) and secretly writing it down. The other player
(the guesser) attempts to guess the number. After each guess, the chooser re-
sponds with correct (the guesser guessed the number and the game is over),
higher (the actual number is higher than the guess), or lower (the actual
number is lower than the guess).
a. Explain why the guesser can receive slightly more than one bit of information
for each response.
b. Assuming the chooser picks the number randomly (that is, all values between
1 and 100 are equally likely), what are the best first guesses? Explain why
these guesses are better than any other guess. (Hint: there are two equally
good first guesses.)
c. What is the maximum number of guesses the second player should need to
always find the number?
d. What is the average number of guesses needed (assuming the chooser picks
the number randomly as before)?
e. [?] Suppose instead of picking randomly, the chooser picks the number with
the goal of maximizing the number of guesses the second player will need.
What number should she pick?
f. [??] How should the guesser adjust her strategy if she knows the chooser is
picking adversarially?
g. [??] What are the best strategies for both players in the adversarial guess-a-
number game where choosers goal is to pick a starting number that maxi-
mizes the number of guesses the guesser needs, and the guessers goal is to
guess the number using as few guesses as possible.
8 1.2. Measuring Computing Power
The two-player game twenty questions starts with the first player (the answerer)
thinking of an object, and declaring if the object is an animal, vegetable, or min-
eral (meant to include all non-living things). After this, the second player (the
questioner), asks binary questions to try and guess the object the first player
thought of. The first player answers each question yes or no. The website
http://www.20q.net/ offers a web-based twenty questions game where a human
acts as the answerer and the computer as the questioner. The game is also sold
as a $10 stand-alone toy (shown in the picture).
20Q Game
Image from ThinkGeek a. How many different objects can be distinguished by a perfect questioner for
the standard twenty questions game?
b. What does it mean for the questioner to play perfectly?
c. Try playing the 20Q game at http://www.20q.net. Did it guess your item?
d. Instead of just yes and no, the 20Q game offers four different answers:
Yes, No, Sometimes, and Unknown. (The website version of the game
also has Probably, Irrelevant, and Doubtful.) If all four answers were
equally likely (and meaningful), how many items could be distinguished in
20 questions?
e. For an Animal, the first question 20Q sometimes asks is Does it jump? (20Q
randomly selected from a few different first questions). Is this a good first
question?
f. [?] How many items do you think 20Q has data for?
g. [??] Speculate on how 20Q could build up its database.
As in the decimal number system, the value of each binary digit depends on its
position.
By using more bits, we can represent larger numbers. With enough bits, we can
represent any natural number this way. The more bits we have, the larger the set
of possible numbers we can represent. As we saw with the binary decision trees,
n bits can be used to represent 2n different numbers.
Discrete Values. We can use a finite sequence of bits to describe any value that
is selected from a countable set of possible values. A set is countable if there is a countable
way to assign a unique natural number to each element of the set. All finite sets
are countable. Some, but not all, infinite sets are countable. For example, there
appear to be more integers than there are natural numbers since for each natural
number, n, there are two corresponding integers, n and n. But, the integers are
in fact countable. We can enumerate the integers as: 0, 1, 1, 2, 2, 3, 3, 4, 4, . . .
and assign a unique natural number to each integer in turn.
Other sets, such as the real numbers, are uncountable. Georg Cantor proved
this using a technique known as diagonalization. Suppose the real numbers are diagonalization
enumerable. This means we could list all the real numbers in order, so we could
assign a unique integer to each number. For example, considering just the real
numbers between 0 and 1, our enumeration might be:
1 .00000000000000 . . .
2 .25000000000000 . . .
3 .33333333333333 . . .
4 .6666666666666 . . .
57236 .141592653589793 . . .
Cantor proved by contradiction that there is no way to enumerate all the real
numbers. The trick is to produce a new real number that is not part of the enu-
meration. We can do this by constructing a number whose first digit is different
from the first digit of the first number, whose second digit is different from the
second digit of the second number, etc. For the example enumeration above, we
might choose .1468 . . ..
The kth digit of the constructed number is different from the kth digit of the num-
ber k in the enumeration. Since the constructed number differs in at least one
digit from every enumerated number, it does not match any of the enumerated
numbers exactly. Thus, there is a real number that is not included in the enu-
meration list, and it is impossible to enumerate all the real numbers.1
Digital computers2 operate on inputs that are discrete values. Continuous val-
ues, such as real numbers, can only be approximated by computers. Next, we
1 Alert readers should be worried that this isnt quite correct since the resulting number may be a
different way to represent the same real number (for example, .1999999999999 . . . = .20000000000 . . .
even though they differ in each digit). This technical problem can be fixed by placing some restric-
tions on how the modified digits are chosen to avoid infinite repetitions.
2 This is, indeed, part of the definition of a digital computer. An analog computer operates on
continuous values. In Chapter 6, we explain more of the inner workings of a computer and why
nearly all computers today are digital. We use computer to mean a digital computer in this book.
The property that there are more real numbers than natural numbers has important implications
for what can and cannot be computed, which we return to in Chapter 12.
10 1.2. Measuring Computing Power
consider how two types of data, text and images, can be represented by com-
puters. The first type, text, is discrete and can be represented exactly; images
are continuous, and can only be represented approximately.
Text. The set of all possible sequences of characters is countable. One way to
see this is to observe that we could give each possible text fragment a unique
number, and then use that number to identify the item. For example we could
enumerate all texts alphabetically by length (here, we limit the characters to low-
ercase letters): a, b, c, . . ., z, aa, ab, . . ., az, ba, . . ., zz, aaa, . . .
Since we have seen that we can represent all the natural numbers with a se-
quence of bits, so once we have the mapping between each item in the set and
a unique natural number, we can represent all of the items in the set. For the
representation to be useful, though, we usually need a way to construct the cor-
responding number for any item directly.
So, instead of enumerating a mapping between all possible character sequences
and the natural numbers, we need a process for converting any text to a unique
number that represents that text. Suppose we limit our text to characters in
the standard English alphabet. If we include lower-case letters (26), upper-case
letters (26), and punctuation (space, comma, period, newline, semi-colon), we
have 57 different symbols to represent. We can assign a unique number to each
symbol, and encode the corresponding number with six bits (this leaves seven
values unused since six bits can distinguish 64 values). For example, we could
encode using the mapping shown in Table 1.1. The first bit answers the ques-
tion: Is it an uppercase letter after F or a special character?. When the first bit
is 0, the second bit answers the question: Is it after p?.
Once we have a way of mapping each individual letter to a fixed-length bit se-
quence, we could write down any sequence of letters by just concatenating the
bits encoding each letter. So, the text CS is encoded as 011100 101100. We
could write down text of length n that is written in the 57-symbol alphabet using
this encoding using 6n bits. To convert the number back into text, just invert the
mapping by replacing each group of six bits with the corresponding letter.
Rich Data. We can also use bit sequences to represent complex data like pic-
tures, movies, and audio recordings. First, consider a simple black and white
picture:
Chapter 1. Computing 11
Since the picture is divided into discrete squares known as pixels, we could en- pixel
code this as a sequence of bits by using one bit to encode the color of each pixel
(for example, using 1 to represent black, and 0 to represent white). This image
is 16x16, so has 256 pixels total. We could represent the image using a sequence
of 256 bits (starting from the top left corner):
0000011111100000
0000100000010000
0011000000001100
0010000000000100
0000011111100000
What about complex pictures that are not divided into discrete squares or a fixed
number of colors, like Van Goghs Starry Night?
whether the space-time of the universe is continuous or discrete. Certainly in our common per-
ception it seems to be continuouswe can imagine dividing any length into two shorter lengths. In
reality, this may not be the case at extremely tiny scales. It is not known if time can continue to be
subdivided below 1040 of a second.
12 1.2. Measuring Computing Power
On the other hand, the human eye and brain have limits. We cannot actu-
ally perceive infinitely many different colors; at some point the wavelengths
are close enough that we cannot distinguish them. Ability to distinguish colors
varies, but most humans can perceive only a few million different colors. The set
of colors that can be distinguished by a typical human is finite; any finite set is
countable, so we can map each distinguishable color to a unique bit sequence.
A common way to represent color is to break it into its three primary compo-
nents (red, green, and blue), and record the intensity of each component. The
more bits available to represent a color, the more different colors that can be
represented.
Thus, we can represent a picture by recording the approximate color at each
point. If space in the universe is continuous, there are infinitely many points.
But, as with color, once the points get smaller than a certain size they are im-
perceptible. We can approximate the picture by dividing the canvas into small
regions and sampling the average color of each region. The smaller the sample
regions, the more bits we will have and the more detail that will be visible in the
image. With enough bits to represent color, and enough sample points, we can
represent any image as a sequence of bits.
Summary. We can use sequences of bits to represent any natural number ex-
actly, and hence, represent any member of a countable set using a sequence of
bits. The more bits we use the more different values that can be represented;
with n bits we can represent 2n different values.
We can also use sequences of bits to represent rich data like images, audio, and
video. Since the world we are trying to represent is continuous there are in-
finitely many possible values, and we cannot represent these objects exactly
with any finite sequence of bits. However, since human perception is limited,
with enough bits we can represent any of these adequately well. Finding ways
to represent data that are both efficient and easy to manipulate and interpret is
a constant challenge in computing. Manipulating sequences of bits is awkward,
so we need ways of thinking about bit-level representations of data at higher
levels of abstraction. Chapter 5 focuses on ways to manage complex data.
of two values. The AGC used about 4000 integrated circuits, each one being able
to perform a single logical operation and costing $1000. The AGC consumed a
significant fraction of all integrated circuits produced in the mid-1960s, and the
project spurred the growth of the integrated circuit industry.
The AGC had 552 960 bits of memory (of which only 61 440 bits were modifiable,
the rest were fixed). The smallest USB flash memory you can buy today (from
SanDisk in December 2008) is the 1 gigabyte Cruzer for $9.99; 1 gigabyte (GB)
is 230 bytes or approximately 8.6 billion bits, about 140 000 times the amount of
memory in the AGC (and all of the Cruzer memory is modifiable). A typical low-
end laptop today has 2 gigabytes of RAM (fast memory close to the processor
that loses its state when the machine is turned off ) and 250 gigabytes of hard
disk memory (slow memory that persists when the machine is turned off ); for
under $600 today we get a computer with over 4 million times the amount of
memory the AGC had. Moores law is a
violation of
Improving by a factor of 4 million corresponds to doubling just over 22 times. Murphys law.
The amount of computing power approximately doubled every two years be- Everything gets
better and better.
tween the AGC in the early 1960s and a modern laptop today (2009). This prop- Gordon Moore
erty of exponential improvement in computing power is known as Moores Law.
Gordon Moore, a co-founder of Intel, observed in 1965 than the number of com-
ponents that can be built in integrated circuits for the same cost was approxi-
mately doubling every year (revisions to Moores observation have put the dou-
bling rate at approximately 18 months instead of one year). This progress has
been driven by the growth of the computing industry, increasing the resources
available for designing integrated circuits. Another driver is that todays tech-
nology is used to design the next technology generation. Improvement in com-
puting power has followed this exponential growth remarkably closely over the
past 40 years, although there is no law that this growth must continue forever.
Although our comparison between the AGC and a modern laptop shows an im-
pressive factor of 4 million improvement, it is much slower than Moores law
would suggest. Instead of 22 doublings in power since 1963, there should have
been 30 doublings (using the 18 month doubling rate). This would produce an
improvement of one billion times instead of just 4 million. The reason is our
comparison is very unequal relative to cost: the AGC was the worlds most ex-
pensive small computer of its time, reflecting many millions of dollars of gov-
ernment funding. Computing power available for similar funding today is well
over a billion times more powerful than the AGC.
as yet unachieved, goal of science is to find a universal law that can describe all
physical behavior at scales from the smallest subparticle to the entire universe,
and all the bosons, muons, dark matter, black holes, and galaxies in between.
Science deals with real things (like bowling balls, planets, and electrons) and at-
tempts to make progress toward theories that predict increasingly precisely how
these real things will behave in different situations.
Computer science focuses on artificial things like numbers, graphs, functions,
and lists. Instead of dealing with physical things in the real world, computer sci-
ence concerns abstract things in a virtual world. The numbers we use in compu-
tations often represent properties of physical things in the real world, and with
enough bits we can model real things with arbitrary precision. But, since our fo-
cus is on abstract, artificial things rather than physical things, computer science
is not a traditional natural science but a more abstract field like mathematics.
Like mathematics, computing is an essential tool for modern science, but when
we study computing on artificial things it is not a natural science itself.
In a deeper sense, computing pervades all of nature. A long term goal of com-
puter science is to develop theories that explain how nature computes. One ex-
ample of computing in nature comes from biology. Complex life exists because
nature can perform sophisticated computing. People sometimes describe DNA
as a blueprint, but it is really much better thought of as a program. Whereas
a blueprint describes what a building should be when it is finished, giving the
dimensions of walls and how they fit together, the DNA of an organism encodes
a process for growing that organism. A human genome is not a blueprint that
describes the body plan of a human, it is a program that turns a single cell into
a complex human given the appropriate environment. The process of evolution
(which itself is an information process) produces new programs, and hence new
species, through the process of natural selection on mutated DNA sequences.
Understanding how both these processes work is one of the most interesting
and important open scientific questions, and it involves deep questions in com-
puter science, as well as biology, chemistry, and physics.
The questions we consider in this book focus on the question of what can and
cannot be computed. This is both a theoretical question (what can be computed
by a given theoretical model of a computer) and a pragmatic one (what can be
computed by physical machines we can build today, as well as by anything pos-
Scientists study the sible in our universe).
world as it is;
engineers create the Engineering. Engineering is about making useful things. Engineering is of-
world that never ten distinguished from crafts in that engineers use scientific principles to create
has been.
Theodore von Karman their designs, and focus on designing under practical constraints. As William
Wulf and George Fisher put it:4
Whereas science is analytic in that it strives to understand nature, or what
is, engineering is synthetic in that it strives to create. Our own favorite de-
scription of what engineers do is design under constraint. Engineering is
creativity constrained by nature, by cost, by concerns of safety, environmen-
tal impact, ergonomics, reliability, manufacturability, maintainability
the whole long list of such ilities. To be sure, the realities of nature is one
of the constraint sets we work under, but it is far from the only one, it is
4 William Wulf and George Fisher, A Makeover for Engineering Education, Issues in Science and
seldom the hardest one, and almost never the limiting one.
Computer scientists do not typically face the natural constraints faced by civil
and mechanical engineerscomputer programs are massless and not exposed
to the weather, so programmers do not face the kinds of physical constraints like
gravity that impose limits on bridge designers. As we saw from the Apollo Guid-
ance Computer comparison, practical constraints on computing power change
rapidly the one billion times improvement in computing power is unlike any
change in physical materials5 . Although we may need to worry about manufac-
turability and maintainability of storage media (such as the disk we use to store
a program), our focus as computer scientists is on the abstract bits themselves,
not how they are stored.
Computer scientists, however, do face many constraints. A primary constraint
is the capacity of the human mindthere is a limit to how much information a
human can keep in mind at one time. As computing systems get more complex,
there is no way for a human to understand the entire system at once. To build
complex systems, we need techniques for managing complexity. The primary
tool computer scientists use to manage complexity is abstraction. Abstraction is abstraction
a way of giving a name to something in a way that allows us to hide unnecessary
details. By using carefully designed abstractions, we can construct complex sys-
tems with reliable properties while limiting the amount of information a human
designer needs to keep in mind at any one time.
Liberal Arts. The notion of the liberal arts emerged during the middle ages to I must study politics
distinguish education for the purpose of expanding the intellects of free people and war that my
sons may have
from the illiberal arts such as medicine and carpentry that were pursued for liberty to study
economic purposes. The liberal arts were intended for people who did not need mathematics and
to learn an art to make a living, but instead had the luxury to pursue purely philosophy. My sons
intellectual activities for their own sake. The traditional seven liberal arts started ought to study
mathematics and
with the Trivium (three roads), focused on language:6 philosophy,
Grammar the art of inventing symbols and combining them to express geography, natural
history, naval
thought architecture,
Rhetoric the art of communicating thought from one mind to another, navigation,
the adaptation of language to circumstance commerce, and
Logic the art of thinking agriculture, in order
to give their
children a right to
The Trivium was followed by the Quadrivium, focused on numbers:
study painting,
Arithmetic theory of number poetry, music,
architecture,
Geometry theory of space
statuary, tapestry,
Music application of the theory of number and porcelain.
Astronomy application of the theory of space John Adams, 1780
All of these have strong connections to computer science, and we will touch on
each of them to some degree in this book.
Language is essential to computing since we use the tools of language to de-
scribe information processes. The next chapter discusses the structure of lan-
guage and throughout this book we consider how to efficiently use and combine
5 For example, the highest strength density material available today, carbon nanotubes, are per-
haps 300 times stronger than the best material available 50 years ago.
6 The quotes defining each liberal art are from Miriam Joseph (edited by Marguerite McGlinn),
The Trivium: The Liberal Arts of Logic, Grammar, and Rhetoric, Paul Dry Books, 2002.
16 1.4. Summary and Roadmap
ecutes the exact steps the program specifies. And it executes them at a remark-
able rate billions of simple steps in each second on a typical laptop. This
changes not just the time it takes to solve a problem, but qualitatively changes
the kinds of problems we can solve, and the kinds of solutions worth consid-
ering. Problems like sequencing the human genome, simulating the global cli-
mate, and making a photomosaic not only could not have been solved without
computing, but perhaps could not have even been envisioned. Chapter 3 intro-
duces programming, and Chapter 4 develops some techniques for constructing
programs that solve problems. To represent more interesting problems, we need
ways to manage more complex data. Chapter 5 concludes Part I by exploring
ways to represent data and define procedures that operate on complex data.
Part II: Analyzing Procedures. Part II considers the problem of estimating the
cost required to execute a procedure. This requires understanding how ma-
chines can compute (Chapter 6), and mathematical tools for reasoning about
how cost grows with the size of the inputs to a procedure (Chapter 7). Chapter 8
provides some extended examples that apply these techniques.
Part III: Improving Expressiveness. The techniques from Part I and II are suf-
ficient for describing all computations. Our goal, however, it to be able to define
concise, elegant, and efficient procedures for performing desired computations.
Part III presents techniques that enable more expressive procedures.
Part IV: The Limits of Computing. We hope that by the end of Part III, readers
will feel confident that they could program a computer to do just about any-
thing. In Part IV, we consider the question of what can and cannot be done by a
mechanical computer. A large class of interesting problems cannot be solved by
any computer, even with unlimited time and space.
Themes. Much of the book will revolve around three very powerful ideas that
are prevalent throughout computing:
Recursive definitions. A recursive definition define a thing in terms of smaller
instances of itself. A simple example is defining your ancestors as (1) your par-
ents, and (2) the ancestors of your ancestors. Recursive definitions can define
an infinitely large set with a small description. They also provide a powerful
technique for solving problems by breaking a problem into solving a simple in-
stance of the problem and showing how to solve a larger instance of the problem
by using a solution to a smaller instance. We use recursive definitions to define
infinite languages in Chapter 2, to solve problems in Chapter 4, to build complex
data structures in Chapter 5. In later chapters, we see how language interpreters
themselves can be defined recursively.
Universality. Computers are distinguished from other machines in that their be-
havior can be changed by a program. Procedures themselves can be described
using just bits, so we can write procedures that process procedures as inputs and
that generate procedures as outputs. Considering procedures as data is both a
powerful problem solving tool, and a useful way of thinking about the power
and fundamental limits of computing. We introduce the use of procedures as
inputs and outputs in Chapter 4, see how generated procedures can be pack-
aged with state to model objects in Chapter 10. One of the most fundamental
results in computing is that any machine that can perform a few simple opera-
tions is powerful enough to perform any computation, and in this deep sense,
18 1.4. Summary and Roadmap
The most powerful tool we have for communication is language. This is true
whether we are considering communication between two humans, between a
human programmer and a computer, or between a network of computers. In
computing, we use language to describe procedures and use machines to turn
descriptions of procedures into executing processes. This chapter is about what
language is, how language works, and ways to define languages.
pairs can work adequately only for trivial communication systems. The number
of possible meanings that can be expressed is limited by the number of entries in
the table. It is impossible to express any new meaning since all meanings must
already be listed in the table!
Languages and Infinity. A useful language must be able to express infinitely
many different meanings. Hence, there must be a way to generate new sur-
face forms and guess their meanings (see Exercise 2.1). No finite representation,
such as a printed table, can contain all the surface forms and meanings in an
infinite language. One way to generate infinitely large sets is to use repeating
patterns. For example, most humans would interpret the notation: 1, 2, 3, . . .
as the set of all natural numbers. We interpret the . . . as meaning keep doing
the same thing for ever. In this case, it means keep adding one to the preced-
ing number. Thus, with only a few numbers and symbols we can describe a set
containing infinitely many numbers. As discussed in Section 1.2.1, the language
of the natural numbers is enough to encode all meanings in any countable set.
But, finding a sensible mapping between most meanings and numbers is nearly
impossible. The surface forms do not correspond closely enough to the ideas we
want to express to be a useful language.
The primitives are the smallest meaningful units (in natural languages these are
known as morphemes). A primitive cannot be broken into smaller parts whose
meanings can be combined to produce the meaning of the unit. The means
of combination are rules for building words from primitives, and for building
phrases and sentences from words.
Since we have rules for producing new words not all words are primitives. For
example, we can create a new word by adding anti- in front of an existing word.
Chapter 2. Language 21
The meaning of the new word can be inferred as against the meaning of the
original word. Rules like this one mean anyone can invent a new word, and use
it in communication in ways that will probably be understood by listeners who
have never heard the word before.
For example, the verb freeze means to pass from a liquid state to a solid state; an-
tifreeze is a substance designed to prevent freezing. English speakers who know
the meaning of freeze and anti- could roughly guess the meaning of antifreeze
even if they have never heard the word before.1
Primitives are the smallest units of meaning, not based on the surface forms.
Both anti and freeze are primitive; they cannot be broken into smaller parts
with meaning. We can break anti- into two syllables, or four letters, but those
sub-components do not have meanings that could be combined to produce the
meaning of the primitive.
Means of Abstraction. In addition to primitives and means of combination,
powerful languages have an additional type of component that enables eco-
nomic communication: means of abstraction.
Means of abstraction allow us to give a simple name to a complex entity. In
English, the means of abstraction are pronouns like she, it, and they. The
meaning of a pronoun depends on the context in which it is used. It abstracts
a complex meaning with a simple word. For example, the it in the previous
sentence abstracts the meaning of a pronoun, but the it in the sentence before
that one abstracts a pronoun.
In natural languages, there are a limited number of means of abstraction. En-
glish, in particular, has a very limited set of pronouns for abstracting people.
It has she and he for abstracting a female or male person, respectively, but no
gender-neutral pronouns for abstracting a person of either sex. The interpre-
tation of what a pronoun abstract in natural languages is often confusing. For
example, it is unclear what the it in this sentence refers to. Languages for pro-
gramming computers need means of abstraction that are both powerful and un-
ambiguous.
Exercise 2.1. According to the Guinness Book of World Records, the longest word
in the English language is floccinaucinihilipilification, meaning The act or
habit of describing or regarding something as worthless. This word was reput-
edly invented by a non-hippopotomonstrosesquipedaliophobic student at Eton
who combined four words in his Latin textbook. Prove Guinness wrong by iden-
tifying a longer English word. An English speaker (familiar with floccinaucinihil-
ipilification and the morphemes you use) should be able to deduce the meaning
of your word.
Exercise 2.2. Merriam-Websters word for the year for 2006 was truthiness, a
word invented and popularized by Stephen Colbert. Its definition is, truth that
comes from the gut, not books. Identify the morphemes that are used to build
truthiness, and explain, based on its composition, what truthiness should mean.
1 Guessing that it is a verb meaning to pass from the solid to liquid state would also be reasonable.
This shows how imprecise and ambiguous natural languages are; for programming computers, we
need the meanings of constructs to be clearly determined.
22 2.3. Recursive Transition Networks
Noun Verb S
Bob runs
Figure 2.1. Simple recursive transition network.
Chapter 2. Language 23
Verb with label Colleen adds two new strings to the language.
The expressive power of recursive transition networks increases dramatically
once we add edges that form cycles in the graph. This is where the recursive
in the name comes from. Once a graph has a cycle, there are infinitely many
possible paths through the graph!
Consider what happens when we add the single and edge to the previous net-
work to produce the network shown in Figure 2.2 below.
and
Alice jumps
Noun Verb S
Bob runs
Figure 2.2. RTN with a cycle.
Now, we can produce infinitely many different strings! We can follow the and
edge back to the Noun node to produce strings like Alice runs and Bob jumps
and Alice jumps with as many conjuncts as we want.
Exercise 2.5. Draw a recursive transition network that defines the language of
the whole numbers: 0, 1, 2, . . .
Exercise 2.6. How many different strings can be produced by the RTN below:
jumps
Alice eats quickly
Subnetworks. In the RTNs we have seen so far, the labels on the output edges
are direct outputs known as terminals: following an edge just produces the sym-
bol on that edge. We can make more expressive RTNs by allowing edge labels to
also name subnetworks. A subnetwork is identified by the name of its starting
24 2.3. Recursive Transition Networks
node. When an edge labeled with a subnetwork is followed, the network traver-
sal jumps to the subnetwork node. Then, it can follow any path from that node
to a final node. Upon reaching a final node, the network traversal jumps back to
complete the edge.
For example, consider the network shown in Figure 2.3. It describes the same
language as the RTN in Figure 2.1, but uses subnetworks for Noun and Verb. To
produce a string, we start in the Sentence node. The only edge out from Sen-
tence is labeled Noun. To follow the edge, we jump to the Noun node, which is
a separate subnetwork. Now, we can follow any path from Noun to a final node
(in this cases, outputting either Alice or Bob on the path toward EndNoun.
Alice jumps
Noun Verb
Sentence S1 EndSentence Noun EndNoun Verb EndVerb
Bob runs
Suppose we replace the Noun subnetwork with the more interesting version
shown in Figure 2.4.This subnetwork includes an edge from Noun to N1 labeled
Noun. Following this edge involves following a path through the Noun subnet-
work. Starting from Noun, we can generate complex phrases like Alice and Bob
or Alice and Bob and Alice (find the two different paths through the network
that generate this phrase).
Noun N1 and
N2
Alice
Noun
Noun
EndNoun
Bob
Figure 2.4. Alternate Noun subnetwork.
To keep track of paths through RTNs without subnetworks, a single marker suf-
fices. We can start with the marker on the start node, and move it along the path
through each node to the final node. Keeping track of paths on an RTN with
subnetworks is more complicated. We need to keep track of where we are in the
current network, and also where to continue to when a final node of the current
subnetwork is reached. Since we can enter subnetworks within subnetworks,
we need a way to keep track of arbitrarily many jump points.
stack A stack is a useful way to keep track of the subnetworks. We can think of a stack
like a stack of trays in a cafeteria. At any point in time, only the top tray on the
stack can be reached. We can pop the top tray off the stack, after which the next
tray is now on top. We can push a new tray on top of the stack, which makes the
old top of the stack now one below the top.
We use a stack of nodes to keep track of the subnetworks as they are entered.
The top of the stack represents the next node to process. At each step, we pop
the node off the stack and follow a transition from that node.
Chapter 2. Language 25
Noun EndNoun
Stack Sentence S1 S1 S1 S1
Output Alice
Verb EndVerb
Output runs
Consider generating the string Alice runs using the RTN in Figure 2.3. We start
following step 1 by pushing Sentence on the stack. In step 2, we pop the stack,
so the current node, N, is Sentence. Since Sentence is not a final node, we do
nothing for step 3. In step 4, we follow an edge starting from Sentence. There is
only one edge to choose and it leads to the node labeled S1. In step 5, we push
S1 on the stack. The edge we followed is labeled with the node Noun, so we
push Noun on the stack. The stack now contains two items: [Noun, S1]. Since
Noun is on top, this means we will first traverse the Noun subnetwork, and then
continue from S1.
As directed by step 7, we go back to step 2 and continue by popping the top
node, Noun, off the stack. It is not a final node, so we continue to step 4, and
select the edge labeled Alice from Noun to EndNoun. We push EndNoun on
the stack, which now contains: [EndNoun, S1]. The label on the edge is the ter-
minal, Alice, so we output Alice following step 6. We continue in the same
manner, following the steps in the procedure as we keep track of a path through
the network. The full processing steps are shown in Figure 2.5.
Exercise 2.8. Show the sequence of stacks used in generating the string Alice
and Bob and Alice runs using the network in Figure 2.3 with the alternate Noun
subnetwork from Figure 2.4.
2 For simplicity, this procedure assumes we always stop when a final node is reached. RTNs can
have edges out of final nodes (as in Figure 2.2) where it is possible to either stop or continue from a
final node.
26 2.4. Replacement Grammars
Exercise 2.9. Identify a string that cannot be produced using the RTN from
Figure 2.3 with the alternate Noun subnetwork from Figure 2.4 without the stack
growing to contain five elements.
Exercise 2.10. The procedure given for traversing RTNs assumes that a subnet-
work path always stops when a final node is reached. Hence, it cannot follow all
possible paths for an RTN where there are edges out of a final node. Describe a
procedure that can follow all possible paths, even for RTNs that include edges
from final nodes.
nonterminal :: replacement
I flunked out every
year. I never
studied. I hated The left side of a rule is always a single symbol, known as a nonterminal since it
studying. I was just can never appear in the final generated string. The right side of a rule contains
goofing around. It one or more symbols. These symbols may include nonterminals, which will be
had the delightful
replaced using replacement rules before generating the final string. They may
consequence that
every year I went to also be terminals, which are output symbols that never appear as the left side
summer school in of a rule. When we describe grammars, we use italics to represent nonterminal
New Hampshire symbols, and bold to represent terminal symbols. The terminals are the primi-
where I spent the
summer sailing and
tives in the language; the grammar rules are its means of combination.
having a nice time.
John Backus We can generate a string in the language described by a replacement grammar
by starting from a designated start symbol (e.g., sentence), and at each step se-
lecting a nonterminal in the working string, and replacing it with the right side of
a replacement rule whose left side matches the nonterminal. Wherever we find
a nonterminal on the left side of a rule, we can replace it with what appears on
the right side of any rule where that nonterminal matches the left side. A string
is generated once there are no nonterminals remaining.
Here is an example BNF grammar (that describes the same language as the RTN
Chapter 2. Language 27
in Figure 2.1):
Starting from Sentence, the grammar can generate four sentences: Alice jumps,
Alice runs, Bob jumps, and Bob runs.
A derivation shows how a grammar generates a given string. Here is the deriva- derivation
tion of Alice runs:
Sentence ::Noun Verb using Rule 1
::Alice Verb replacing Noun using Rule 2
::Alice runs replacing Verb using Rule 5
We can represent a grammar derivation as a tree, where the root of the tree is
the starting nonterminal (Sentence in this case), and the leaves of the tree are
the terminals that form the derived sentence. Such a tree is known as a parse
tree. Here is the parse tree for the derivation of Alice runs: parse tree
Sentence
Noun Verb
Alice runs
BNF grammars can be more compact than just listing strings in the language
since a grammar can have many replacements for each nonterminal. For exam-
ple, adding the rule, Noun :: Colleen, to the grammar adds two new strings
(Colleen runs and Colleen jumps) to the language.
Recursive Grammars. The real power of BNF as a compact notation for describ-
ing languages, though, comes once we start adding recursive rules to our gram-
mar. A grammar is recursive if the grammar contains a nonterminal that can
produce a production that contains itself.
Suppose we add the rule,
Number
Digit MoreDigits
3 Number
Digit MoreDigits
7 e
Circular vs. Recursive Definitions. The second rule means we can replace
MoreDigits with nothing. This is sometimes written as e to make it clear that the
replacement is empty: MoreDigits :: e.
This is a very important rule in the grammarwithout it no strings could be gen-
erated; with it infinitely many strings can be generated. The key is that we can
only produce a string when all nonterminals in the string have been replaced
with terminals. Without the MoreDigits :: e rule, the only rule we would have
with MoreDigits on the left side is the third rule: MoreDigits :: Number.
The only rule we have with Number on the left side is the first rule, which re-
places Number with Digit MoreDigits. Every time we follow this rule, we replace
MoreDigits with Digit MoreDigits. We can produce as many Digits as we want,
but without the MoreDigits :: e rule we can never stop.
This is the difference between a circular definition, and a recursive definition.
Without the stopping rule, MoreDigits would be defined in a circular way. There
is no way to start with MoreDigits and generate a production that does not con-
tain MoreDigits (or a nonterminal that eventually must produce MoreDigits).
With the MoreDigits :: e rule, however, we have a way to produce something
base case terminal from MoreDigits. This is known as a base case a rule that turns an
otherwise circular definition into a meaningful, recursive definition.
Condensed Notation. It is common to have many grammar rules with the same
left side nonterminal. For example, the whole numbers grammar has ten rules
with Digit on the left side to produce the ten terminal digits. Each of these is an
alternative rule that can be used when the production string contains the non-
terminal Digit. A compact notation for these types of rules is to use the vertical
Chapter 2. Language 29
bar (|) to separate alternative replacements. For example, we could write the ten
Digit rules compactly as:
Digit :: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Exercise 2.11. Suppose we replaced the first rule (Number :: Digit MoreDigits)
in the whole numbers grammar with: Number :: MoreDigits Digit.
a. How does this change the parse tree for the derivation of 37? Draw the parse
tree that results from the new grammar.
b. Does this change the language? Either show some string that is in the lan-
guage defined by the modified grammar but not in the original language (or
vice versa), or argue that both grammars generate the same strings.
Exercise 2.12. The grammar for whole numbers we defined allows strings with
non-standard leading zeros such as 000 and 00005. Devise a grammar that
produces all whole numbers (including 0), but no strings with unnecessary
leading zeros.
Exercise 2.13. Define a BNF grammar that describes the language of decimal
numbers (the language should include 3.14159, 0.423, and 1120 but not 1.2.3).
Exercise 2.14. The BNF grammar below (extracted from Paul Mockapetris, Do-
main Names - Implementation and Specification, IETF RFC 1035) describes the
language of domain names on the Internet.
Domain :: SubDomainList
SubDomainList :: Label | SubDomainList . Label
Label :: Letter MoreLetters
MoreLetters :: LetterHyphens LetterDigit | e
LetterHyphens :: LDHyphen | LDHyphen LetterHyphens | e
LDHyphen :: LetterDigit | -
LetterDigit :: Letter | Digit
Letter :: A|B| ... |Z|a|b| ... |z
Digit :: 0|1|2|3|4|5|6|7|8|9
Section 2.4 claimed that recursive transition networks and BNF grammars are
equally powerful. What does it mean to say two systems are equally powerful?
A language description mechanism is used to define a set of strings comprising a
language. Hence, the power of a language description mechanism is determined
by the set of languages it can define.
One approach to measure the power of language description mechanism would
be to count the number of languages that it can define. Even the simplest mech-
30 2.4. Replacement Grammars
anisms can define infinitely many languages, however, so just counting the num-
ber of languages does not distinguish well between the different language de-
scription mechanisms. Both RTNs and BNFs can describe infinitely many differ-
ent languages. We can always add a new edge to an RTN to increase the number
of strings in the language, or add a new replacement rule to a BNF that replaces
a nonterminal with a new terminal symbol.
Instead, we need to consider the set of languages that each mechanism can de-
fine. A system A is more powerful that another system B if we can use A to
define every language that can be defined by B, and there is some language L
that can be defined using A that cannot be defined using B. This matches our
intuitive interpretation of more powerful A is more powerful than B if it can
do everything B can do and more.
The diagrams in Figure 2.6 show three possible scenarios. In the leftmost pic-
ture, the set of languages that can be defined by B is a proper subset of the set
of languages that can be defined by A. Hence, A is more powerful than B. In
the center picture, the sets are equal. This means every language that can be de-
fined by A can also be defined by B, and every language that can be defined by B
can also be defined by A, and the systems are equally powerful. In the rightmost
picture, there are some elements of A that are not elements of B, but there are
also some elements of B that are not elements of A. This means we cannot say
either one is more powerful; A can do some things B cannot do, and B can do
some things A cannot do.
B
A B A A
B
The grammar has three nonterminals: Number, Digit, and MoreDigits. For each
nonterminal, we construct a subnetwork by first creating two nodes correspond-
ing to the start and end of the subnetwork for the nonterminal. We make Start-
Number the start node for the RTN since Number is the starting nonterminal for
the grammar.
Next, we need to add edges to the RTN corresponding to the production rules
in the grammar. The first rule indicates that Number can be replaced by Digit
MoreDigits. To make the corresponding RTN, we need to introduce an interme-
diate node since each RTN edge can only contain one label. We need to traverse
two edges, with labels StartDigit and StartMoreDigits between the StartNumber
and EndNumber nodes. The resulting partial RTN is shown in Figure 2.7.
StartDigit StartMoreDigits
StartNumber X0 EndNumber
For the MoreDigits nonterminal there are two productions. The first means
MoreDigits can be replaced with nothing. In an RTN, we cannot have edges
with unlabeled outputs. So, the equivalent of outputting nothing is to turn Start-
MoreDigits into a final node. The second production replaces MoreDigits with
Number. We do this in the RTN by adding an edge between StartMoreDigits and
EndMoreDigits labeled with Number, as shown in Figure 2.8.
Number
Start- End-
MoreDigits MoreDigits
Finally, we convert the ten Digit productions. For each rule, we add an edge
between StartDigit and EndDigit labeled with the digit terminal, as shown in
Figure 2.9.
This example illustrates that it is possible to convert a particular grammar to an
RTN. For a general proof, we present a general an algorithm that can be used to
do the same conversion for any BNF:
1. For each nonterminal X in the grammar, construct two nodes, StartX and
32 2.5. Summary
0
1
StartDigit EndDigit
......
9
2. For each rule in the grammar, add a corresponding path through the RTN.
All BNF rules have the form X :: replacement where X is a nonterminal
in the grammar and replacement is a sequence of zero or more terminals
and nonterminals: [ R0 , R1 , . . . , Rn ].
Following this procedure, we can convert any BNF grammar into an RTN that
defines the same language. Hence, we have proved that RTNs are at least as
powerful as BNF grammars.
To complete the proof that BNF grammars and RTNs are equally powerful ways
of defining languages, we also need to show that a BNF can define every lan-
guage that can be defined using an RTN. This part of the proof can be done
using a similar strategy in reverse: by showing a procedure that can be used to
construct a BNF equivalent to any input RTN. We leave the details as an exercise
for especially ambitious readers.
Exercise 2.15. Produce an RTN that defines the same languages as the BNF
grammar from Exercise 2.14.
Exercise 2.16. [?] Prove that BNF grammars are as powerful as RTNs by devising
a procedure that can construct a BNF grammar that defines the same language
as any input RTN.
2.5 Summary
Languages define a set of surface forms and associated meanings. Since use-
ful language must be able to express infinitely many things, we need tools for
Chapter 2. Language 33
defining infinite sets of surface forms using compact and precise notations. The
tool we will use for the remainder of this book is the BNF replacement gram-
mar which precisely defines a language using replacement rules. This system
can describe infinite languages with small representations because of the power
of recursive rules. In the next chapter, we introduce the Scheme programming
language that we will use to describe procedures.
34 2.5. Summary
Programming
3
The Analytical Engine has no pretensions whatever to originate any thing. It can
do whatever we know how to order it to perform. It can follow analysis; but it
has no power of anticipating any analytical relations or truths. Its province is to
assist us in making available what we are already acquainted with.
Augusta Ada Countess of Lovelace, in Notes on the Analytical Engine, 1843
Complexity. Although English may seem simple to you now, it took many years
of intense effort (most of it subconscious) for you to learn it. Despite using it for
most of their waking hours for many years, native English speakers know a small
fraction of the entire language. The Oxford English Dictionary contains 615,000
words, of which a typical native English speaker knows about 40,000.
Ambiguity. Not only do natural languages have huge numbers of words, most
words have many different meanings. Understanding the intended meaning of
an utterance requires knowing the context, and sometimes pure guesswork.
For example, what does it mean to be paid biweekly? According to the American
Heritage Dictionary1 , biweekly has two definitions:
biweekly).
3 Carl C. Gaither and Alma E. Cavazos-Gaither, Practically Speaking: A Dictionary of Quotations
word means more than one of the original words meaning. This rule works for
most words: word 7 words, language 7 languages, person 7 persons.4
It does not work for all words, however. The plural of goose is geese (and gooses
is not an English word), the plural of deer is deer (and deers is not an English
word), and the plural of beer is controversial (and may depend on whether you
speak American English or Canadian English).
These irregularities can be charming for a natural language, but they are a con-
stant source of difficulty for non-native speakers attempting to learn a language.
There is no sure way to predict when the rule can be applied, and it is necessary
to memorize each of the irregular forms.
Uneconomic. It requires a lot of space to express a complex idea in a natural lan- I have made this
guage. Many superfluous words are needed for grammatical correctness, even letter longer than
usual, only because
though they do not contribute to the desired meaning. Since natural languages I have not had the
evolved for everyday communication, they are not well suited to describing the time to make it
precise steps and decisions needed in a computer program. shorter.
Blaise Pascal, 1657
As an example, consider a procedure for finding the maximum of two numbers.
In English, we could describe it like this:
To find the maximum of two numbers, compare them. If the first num-
ber is greater than the second number, the maximum is the first number.
Otherwise, the maximum is the second number.
Perhaps shorter descriptions are possible, but any much shorter description
probably assumes the reader already knows a lot. By contrast, we can express
the same steps in the Scheme programming language in very concise way (dont
worry if this doesnt make sense yetit should by the end of this chapter):
(define (bigger a b) (if (> a b) a b))
Another reason there are many different programming languages is that they
are at different levels of abstraction. Some languages provide programmers with
detailed control over machine resources, such as selecting a particular location
in memory where a value is stored. Other languages hide most of the details of
the machine operation from the programmer, allowing them to focus on higher-
level actions.
Ultimately, we want a program the computer can execute. This means at the
lowest level we need languages the computer can understand directly. At this
level, the program is just a sequence of bits encoding machine instructions.
Code at this level is not easy for humans to understand or write, but it is easy
for a processor to execute quickly. The machine code encodes instructions that
direct the processor to take simple actions like moving data from one place to
another, performing simple arithmetic, and jumping around to find the next in-
struction to execute.
For example, the bit sequence 1110101111111110 encodes an instruction in the
Intel x86 instruction set (used on most PCs) that instructs the processor to jump
backwards two locations. Since the instruction itself requires two locations of
space, jumping back two locations actually jumps back to the beginning of this
instruction. Hence, the processor gets stuck running forever without making
any progress.
Grace Hopper
Image courtesy Computer
History Museum (1952)
The computers processor is designed to execute very simple instructions like
jumping, adding two small numbers, or comparing two values. This means each
instruction can be executed very quickly. A typical modern processor can exe-
cute billions of instructions in a second.5
Until the early 1950s, all programming was done at the level of simple instruc-
tions. The problem with instructions at this level is that they are not easy for
humans to write and understand, and you need many simple instructions be-
fore you have a useful program.
compiler A compiler is a computer program that generates other programs. It translates
an input program written in a high-level language that is easier for humans to
create into a program in a machine-level language that can be executed by the
computer. Admiral Grace Hopper developed the first compilers in the 1950s.
interpreter An alternative to a compiler is an interpreter. An interpreter is a tool that trans-
lates between a higher-level language and a lower-level language, but where a
compiler translates an entire program at once and produces a machine language
program that can be executed directly, an interpreter interprets the program a
Nobody believed small piece at a time while it is running. This has the advantage that we do not
that I had a have to run a separate tool to compile a program before running it; we can sim-
running compiler
and nobody would
ply enter our program into the interpreter and run it right away. This makes it
touch it. They told easy to make small changes to a program and try it again, and to observe the
me computers could state of our program as it is running.
only do arithmetic.
Grace Hopper One disadvantage of using an interpreter instead of a compiler is that because
the translation is happening while the program is running, the program exe-
cutes slower than a compiled program. Another advantage of compilers over
5 A 2GHz processor executes 2 billion cycles per second. This does not map directly to the num-
ber of instructions it can execute in a second, though, since some instructions take several cycles to
execute.
Chapter 3. Programming 39
interpreters is that since the compiler translates the entire program it can also
analyze the program for consistency and detect certain types of programming
mistakes automatically instead of encountering them when the program is run-
ning (or worse, not detecting them at all and producing unintended results).
This is especially important when writing critical programs such as flight con-
trol software we want to detect as many problems as possible in the flight
control software before the plane is flying!
Since we are more concerned with interactive exploration than with performance
and detecting errors early, we use an interpreter instead of a compiler.
3.3 Scheme
The programming system we use for the first part of this book is depicted in
Figure 3.1. The input to our programming system is a program written in a pro-
gramming language named Scheme. A Scheme interpreter interprets a Scheme
program and executes it on the machine processor.
Scheme was developed at MIT in the 1970s by Guy Steele and Gerald Sussman,
based on the LISP programming language that was developed by John McCarthy
in the 1950s. Although many large systems have been built using Scheme, it is
not widely used in industry. It is, however, a great language for learning about
computing and programming. The primary advantage of using Scheme to learn
about computing is its simplicity and elegance. The language is simple enough
that this chapter covers nearly the entire language (we defer describing a few
aspects until Chapter 9), and by the end of this book you will know enough to
implement your own Scheme interpreter. By contrast, some programming lan-
guages that are widely used in industrial programming such as C++ and Java
require thousands of pages to describe, and even the worlds experts in those
languages do not agree on exactly what all programs mean.
(define (bigger a b)
(if (> a b) a b))
Scheme Program
(bigger 3 4)
Interpreter
(DrRacket)
Processor
3.4 Expressions
A Scheme program is composed of expressions and definitions (we cover defi-
expression nitions in Section 3.5). An expression is a syntactic element that has a value.
The act of determining the value associated with an expression is called evalua-
evaluation tion. A Scheme interpreter, such as the one provided in DrRacket, is a machine
for evaluating Scheme expressions. If you enter an expression into a Scheme
interpreter, the interpreter evaluates the expression and displays its value.
Expressions may be primitives. Scheme also provides means of combination
for producing complex expressions from simple expressions. The next subsec-
tions describe primitive expressions and application expressions. Section 3.6
describes expressions for making procedures and Section 3.7 describes expres-
sions that can be used to make decisions.
3.4.1 Primitives
An expression can be replaced with a primitive:
Expression :: PrimitiveExpression
As with natural languages, primitives are the smallest units of meaning. Hence,
the value of a primitive is its pre-defined meaning.
Scheme provides many different primitives. Three useful types of primitives are
described next: numbers, Booleans, and primitive procedures.
Numbers. Numbers represent numerical values. Scheme provides all the kinds
of numbers you are familiar with including whole numbers, negative numbers,
decimals, and rational numbers.
Example numbers include:
150 0 12
3.14159 3/4 999999999999999999999
Numbers evaluate to their value. For example, the value of the primitive expres-
sion 1120 is 1120.
Booleans. Booleans represent truth values. There are two primitives for repre-
senting true and false:
PrimitiveExpression :: true | false
The meaning of true is true, and the meaning of false is false. In the DrRacket
interpreter, #t and #f are used to represent the primitive truth values. So, the
value true appears as #t in the interactions window.
Chapter 3. Programming 41
Expression :: ApplicationExpression
ApplicationExpression :: (Expression MoreExpressions)
MoreExpressions :: e | Expression MoreExpressions
Expression
ApplicationExpression
( Expression MoreExpressions )
1 PrimitiveExpression e
Expression
ApplicationExpression
( Expression MoreExpressions )
( 10 10) ApplicationExpression e
(+ 25 25)
This tree is similar to the previous tree, except instead of the subexpressions
of the first application expression being simple primitive expressions, they are
now application expressions. (Instead of showing the complete parse tree for
the nested application expressions, we use triangles.)
To evaluate the output application, we need to evaluate all the subexpressions.
The first subexpression, +, evaluates to the primitive procedure. The second
subexpression, ( 10 10), evaluates to 100, and the third expression, (+ 25 25),
evaluates to 50. Now, we can evaluate the original expression using the values
for its three component subexpressions: (+ 100 50) evaluates to 150.
Exercise 3.1. Draw a parse tree for the Scheme expression (+ 100 ( 5 (+ 5 5)))
and show how it is evaluated.
Exercise 3.2. Predict how each of the following Scheme expressions is evalu-
ated. After making your prediction, try evaluating the expression in DrRacket. If
the result is different from your prediction, explain why the Scheme interpreter
evaluates the expression as it does.
a. 1120
b. (+ 1120)
c. (+ (+ 10 20) ( 2 0))
d. (= (+ 10 20) ( 15 (+ 5 5)))
e. +
f. (+ + <)
Exercise 3.3. For each question, construct a Scheme expression and evaluate it
in DrRacket.
a. How many seconds are there in a year?
b. For how many seconds have you been alive?
c. For what fraction of your life have you been in school?
44 3.5. Definitions
3.5 Definitions
Scheme provides a simple, yet powerful, mechanism for abstraction. A defini-
tion introduces a new name and gives it a value:
Definition :: (define Name Expression)
After a definition, the N ame in the definition is now associated with the value of
the expression in the definition. A definition is not an expression since it does
not evaluate to a value.
A name can be any sequence of letters, digits, and special characters (such as
, >, ?, and !) that starts with a letter or special character. Examples of valid
names include a, Ada, Augusta-Ada, gold49, !yuck, and yikes!\%@\#. We dont
recommend using some of these names in your programs, however! A good pro-
grammer will pick names that are easy to read, pronounce, and remember, and
that are not easily confused with other names.
After a name has been bound to a value by a definition, that name may be used
in an expression:
Expression :: NameExpression
NameExpression :: Name
The value of a NameExpression is the value associated with the Name. (Alert
readers should be worried that we need a more precise definition of the meaning
of definitions to know what it means for a value to be associated with a name.
This informal notion will serve us well for now, but we will need a more precise
explanation of the meaning of a definition in Chapter 9.)
Below we define speed-of-light to be the speed of light in meters per second,
define seconds-per-hour to be the number of seconds in an hour, and use them
to calculate the speed of light in kilometers per hour:
> (define speed-of-light 299792458)
> speed-of-light
299792458
> (define seconds-per-hour ( 60 60))
> (/ ( speed-of-light seconds-per-hour) 1000)
1079252848 4/5
Chapter 3. Programming 45
3.6 Procedures
In Chapter 1 we defined a procedure as a description of a process. Scheme pro-
vides a way to define procedures that take inputs, carry out a sequence of ac-
tions, and produce an output. Section 3.4.1 introduced some of Schemes prim-
itive procedures. To construct complex programs, however, we need to be able
to create our own procedures.
Procedures are similar to mathematical functions in that they provide a map-
ping between inputs and outputs, but they differ from mathematical functions
in two important ways:
State. In addition to producing an output, a procedure may access and mod-
ify state. This means that even when the same procedure is applied to the
same inputs, the output produced may vary. Because mathematical func-
tions do not have external state, when the same function is applied to the
same inputs it always produces the same result. State makes procedures
much harder to reason about. We will ignore this issue until Chapter 9, and
focus until then only on procedures that do not involve any state.
Resources. Unlike an ideal mathematical function, which provides an instan-
taneous and free mapping between inputs and outputs, a procedure re-
quires resources to execute before the output is produced. The most impor-
tant resources are space (memory) and time. A procedure may need space
to keep track of intermediate results while it is executing. Each step of a
procedure requires some time to execute. Predicting how long a procedure
will take to execute and finding the fastest procedure possible for solving
some problem are core problems in computer science. We consider this
throughout this book, and in particular in Chapter 7.
For the rest of this chapter, we view procedures as idealized mathematical func-
tions: we consider only procedures that involve no state and do not worry about
the resources required to execute our procedures.
as its output.
(lambda (a b) (+ a b))
Procedure that takes two inputs, and produces the sum of the input values
as its output.
(lambda () 0)
Procedure that takes no inputs, and produces 0 as its output. The result of
applying this procedure to any argument is always 0.
(lambda (a) (lambda (b) (+ a b)))
Procedure that takes one input (a), and produces as its output a procedure
that takes one input and produces the sum of a and that input as its output.
higher-order This is an example of a higher-order procedure. Higher-order procedures
procedure produce procedures as their output or take procedures as their arguments.
This can be confusing, but is also very powerful.
> (square 2)
4
> (square 1/4)
1/16
> (square (square 2))
16
need the begin expression until we start dealing with procedures that have side effects. We describe
the begin special form in Chapter 9.
48 3.7. Decisions
This incorporates the lambda invisibly into the definition, but means exactly
the same thing. For example,
(define square (lambda (x) ( x x)))
can be written equivalently as:
(define (square x) ( x x))
Exercise 3.5. Define a procedure, cube, that takes one number as input and
produces as output the cube of that number.
Exercise 3.6. Define a procedure, compute-cost, that takes as input two num-
bers, the first represents that price of an item, and the second represents the
sales tax rate. The output should be the total cost, which is computed as the
price of the item plus the sales tax on the item, which is its price times the sales
tax rate. For example, (compute-cost 13 0.05) should evaluate to 13.65.
3.7 Decisions
To make more useful procedures, we need the actions taken to depend on the
input values. For example, we may want a procedure that takes two numbers as
inputs and evaluates to the greater of the two inputs. To define such a procedure
we need a way of making a decision. The IfExpression expression provides a way
of using the result of one expression to select which of two possible expressions
to evaluate:
Expression :: IfExpression
IfExpression :: (if ExpressionPredicate
ExpressionConsequent
ExpressionAlternate )
The IfExpression replacement has three Expression terms. For clarity, we give
each of them names as denoted by the Predicate, Consequent, and Alternate
subscripts. To evaluate an IfExpression, first evaluate the predicate expression,
ExpressionPredicate . If it evaluates to any non-false value, the value of the IfEx-
pression is the value of ExpressionConsequent , the consequent expression, and the
alternate expression is not evaluated at all. If the predicate expression evaluates
to false, the value of the IfExpression is the value of ExpressionAlternate , the alter-
nate expression, and the consequent expression is not evaluated at all.
The predicate expression determines which of the two following expressions is
evaluated to produce the value of the IfExpression. If the value of the predicate
is anything other than false, the consequent expression is used. For example, if
the predicate evaluates to true, to a number, or to a procedure the consequent
expression is evaluated.
special form The if expression is a special form. This means that although it looks syntacti-
cally identical to an application (that is, it could be an application of a procedure
named if), it is not evaluated as a normal application would be. Instead, we have
Chapter 3. Programming 49
a special evaluation rule for if expressions. The reason a special evaluation rule
is needed is because we do not want all the subexpressions to be evaluated. With
the normal application rule, all the subexpressions are evaluated first, and then
the procedure resulting from the first subexpression is applied to the values re-
sulting from the others. With the if special form evaluation rule, the predicate
expression is always evaluated first and only one of the following subexpressions
is evaluated depending on the result of evaluating the predicate expression.
This means an if expression can evaluate to a value even if evaluating one of its
subexpressions would produce an error. For example,
(if (> 3 4) ( + +) 7)
evaluates to 7 even though evaluating the subexpression ( + +) would produce
an error. Because of the special evaluation rule for if expressions, the conse-
quent expression is never evaluated.
Now that we have procedures, decisions, and definitions, we can understand the
bigger procedure from the beginning of the chapter. The definition,
(define (bigger a b) (if (> a b) a b))
is a condensed procedure definition. It is equivalent to:
(define bigger (lambda (a b) (if (> a b) a b)))
This defines the name bigger as the value of evaluating the procedure expression
(lambda (a b) (if (> a b) a b)). This is a procedure that takes two inputs, named
a and b. Its body is an if expression with predicate expression (> a b). The
predicate expression compares the value that is bound to the first parameter, a,
with the value that is bound to the second parameter, b, and evaluates to true if
the value of the first parameter is greater, and false otherwise. According to the
evaluation rule for an if expression, when the predicate evaluates to any non-
false value (in this case, true), the value of the if expression is the value of the
consequent expression, a. When the predicate evaluates to false, the value of
the if expression is the value of the alternate expression, b. Hence, our bigger
procedure takes two numbers as inputs and produces as output the greater of
the two inputs.
Exercise 3.7. Follow the evaluation rules to evaluate the Scheme expression:
(bigger 3 4)
where bigger is the procedure defined above. (It is very tedious to follow all of
the steps (thats why we normally rely on computers to do it!), but worth doing
once to make sure you understand the evaluation rules.)
50 3.8. Evaluation Rules
Exercise 3.8. Define a procedure, xor, that implements the logical exclusive-or
operation. The xor function takes two inputs, and outputs true if exactly one of
those outputs has a true value. Otherwise, it outputs false. For example, (xor true
true) should evaluate to false and (xor (< 3 5) (= 8 8)) should evaluate to true.
Exercise 3.9. Define a procedure, absvalue, that takes a number as input and
produces the absolute value of that number as its output. For example, (ab-
svalue 3) should evaluate to 3 and (absvalue 150) should evaluate to 150.
Exercise 3.10. Define a procedure, bigger-magnitude, that takes two inputs, and
outputs the value of the input with the greater magnitude (that is, absolute dis-
tance from zero). For example, (bigger-magnitude 5 7) should evaluate to 7,
and (bigger-magnitude 9 3) should evaluate to 9.
Exercise 3.11. Define a procedure, biggest, that takes three inputs, and produces
as output the maximum value of the three inputs. For example, (biggest 5 7 3)
should evaluate to 7. Find at least two different ways to define biggest, one using
bigger, and one without using it.
Abbreviation for
(define Name (lambda Parameters) Expression)
Expression :: PrimitiveExpression | NameExpression
| ApplicationExpression
| ProcedureExpression | IfExpression
The value of the expression is the value of the replacement
expression.
PrimitiveExpression :: Number | true | false | primitive procedure
NameExpression :: Name
The evaluation rule for an application (Rule 3b) uses apply to perform the ap-
plication. Apply is defined by the two application rules:
Application Rule 1: Primitives.
To apply a primitive procedure, just do it.
Application Rule 2: Constructed Procedures.
To apply a constructed procedure, evaluate the body of the procedure with
each parameter name bound to the corresponding input expression value.
Application Rule 2 uses the evaluation rules to evaluate the expression. Thus,
the evaluation rules are defined using the application rules, which are defined
using the evaluation rules! This appears to be a circular definition, but as with
the grammar examples, it has a base case. Some expressions evaluate without
using the application rules (e.g., primitive expressions, name expressions), and
some applications can be performed without using the evaluation rules (when
the procedure to apply is a primitive). Hence, the process of evaluating an ex-
pression will sometimes finish and when it does we end with the value of the
expression.7
7 This does not guarantee that evaluation always finishes, however! The next chapter includes
3.9 Summary
At this point, we have covered enough of Scheme to write useful programs (even
if the programs we have seen so far seem rather dull). In fact (as we show in
Chapter 12), we have covered enough to express every possible computation!
We just need to combine these constructs in more complex ways to perform
more interesting computations. The next chapter (and much of the rest of this
book), focuses on ways to combine the constructs for making procedures, mak-
ing decisions, and applying procedures in more powerful ways.
Problems and Procedures
4
A great discovery solves a great problem, but there is a grain of discovery in the solution of
any problem. Your problem may be modest, but if it challenges your curiosity and brings into
play your inventive faculties, and if you solve it by your own means, you may experience the
tension and enjoy the triumph of discovery.
George Polya, How to Solve It
takes zero or more inputs, and produces one output or no outputs2 , as shown in
Figure 4.1.
Inputs Procedure
Output
Our goal in solving a problem is to devise a procedure that takes inputs that
define a problem instance, and produces as output the solution to that problem
instance. The procedure should be an algorithm this means every application
of the procedure must eventually finish evaluating and produce an output value.
There is no magic wand for solving problems. But, most problem solving in-
volves breaking problems you do not yet know how to solve into simpler and
simpler problems until you find problems simple enough that you already know
how to solve them. The creative challenge is to find the simpler subproblems
that can be combined to solve the original problem. This approach of solving
problems by breaking them into simpler parts is known as divide-and-conquer.
divide-and-conquer
The following sections describe a two key forms of divide-and-conquer problem
solving: composition and recursive problem solving. We will use these same
problem-solving techniques in different forms throughout this book.
Input
f g
Output
We can express this composition with the Scheme expression (g (f x)) where x
is the input. The written order appears to be reversed from the picture in Fig-
ure 4.2. This is because we apply a procedure to the values of its subexpressions:
2 Although procedures can produce more than one output, we limit our discussion here to proce-
dures that produce no more than one output. In the next chapter, we introduce ways to construct
complex data, so any number of output values can be packaged into a single output.
Chapter 4. Problems and Procedures 55
the values of the inner subexpressions must be computed first, and then used as
the inputs to the outer applications. So, the inner subexpression (f x) is evalu-
ated first since the evaluation rule for the outer application expression is to first
evaluate all the subexpressions.
To define a procedure that implements the composed procedure we make x a
parameter:
(define fog (lambda (x) (g (f x))))
This defines fog as a procedure that takes one input and produces as output the
composition of f and g applied to the input parameter. This works for any two
procedures that both take a single input parameter.
We can compose the square and cube procedures from Chapter 3:
(define sixth-power (lambda (x) (cube (square x))))
Then, (sixth-power 2) evaluates to 64.
3 We name our composition procedure fcompose to avoid collision with the built-in compose pro-
Exercise 4.1. For each expression, give the value to which the expression evalu-
ates. Assume fcompose and inc are defined as above.
a. ((fcompose square square) 3)
b. (fcompose (lambda (x) ( x 2)) (lambda (x) (/ x 2)))
c. ((fcompose (lambda (x) ( x 2)) (lambda (x) (/ x 2))) 1120)
d. ((fcompose (fcompose inc inc) inc) 2)
Exercise 4.3. Define a procedure fcompose3 that takes three procedures as in-
put, and produces as output a procedure that is the composition of the three
input procedures. For example, ((fcompose3 abs inc square) 5) should evalu-
ate to 36. Define fcompose3 two different ways: once without using fcompose,
and once using fcompose.
Exercise 4.4. The fcompose procedure only works when both input procedures
take one input. Define a f2compose procedure that composes two procedures
where the first procedure takes two inputs, and the second procedure takes one
input. For example, ((f2compose + abs) 3 5) should evaluate to 2.
f
Input
Of course, this doesnt work very well!4 Every application of f results in another
application of f to evaluate. This never stops no output is ever produced and
the interpreter will keep evaluating applications of f until it is stopped or runs
out of memory.
We need a way to make progress and eventually stop, instead of going around
in circles. To make progress, each subsequent application should have a smaller
input. Then, the applications stop when the input to the procedure is simple
enough that the output is already known. The stopping condition is called the
base case, similarly to the grammar rules in Section 2.4. In our grammar ex- base case
amples, the base case involved replacing the nonterminal with nothing (e.g.,
MoreDigits :: e) or with a terminal (e.g., Noun :: Alice). In recursive pro-
cedures, the base case will provide a solution for some input for which the prob-
lem is so simple we already know the answer. When the input is a number, this
is often (but not necessarily) when the input is 0 or 1.
To define a recursive procedure, we use an if expression to test if the input matches
the base case input. If it does, the consequent expression is the known answer
for the base case. Otherwise, the recursive case applies the procedure again but
with a smaller input. That application needs to make progress towards reaching
the base case. This means, the input has to change in a way that gets closer to
the base case input. If the base case is for 0, and the original input is a positive
number, one way to get closer to the base case input is to subtract 1 from the
input value with each recursive application.
This evaluation spiral is depicted in Figure 4.4. With each subsequent recursive
call, the input gets smaller, eventually reaching the base case. For the base case
application, a result is returned to the previous application. This is passed back
up the spiral to produce the final output. Keeping track of where we are in a
recursive evaluation is similar to keeping track of the subnetworks in an RTN
traversal. The evaluator needs to keep track of where to return after each recur-
sive evaluation completes, similarly to how we needed to keep track of the stack
of subnetworks to know how to proceed in an RTN traversal.
Here is the corresponding procedure:
(define g
(lambda (n)
(if (= n 0) 1 (g ( n 1)))))
Unlike the earlier circular f procedure, if we apply g to any non-negative integer
it will eventually produce an output. For example, consider evaluating (g 2).
4 Curious readers should try entering this definition into a Scheme interpreter and evaluating (f
0). If you get tired of waiting for an output, in DrRacket you can click the Stop button in the upper
right corner to interrupt the evaluation.
58 4.3. Recursive Problem Solving
2 1 0
1
1
1
When we evaluate the first application, the value of the parameter n is 2, so the
predicate expression (= n 0) evaluates to false and the value of the procedure
body is the value of the alternate expression, (g ( n 1)). The subexpression, ( n
1) evaluates to 1, so the result is the result of applying g to 1. As with the previous
application, this leads to the application, (g ( n 1)), but this time the value of
n is 1, so ( n 1) evaluates to 0. The next application leads to the application, (g
0). This time, the predicate expression evaluates to true and we have reached the
base case. The consequent expression is just 1, so no further applications of g
are performed and this is the result of the application (g 0). This is returned as
the result of the (g 1) application in the previous recursive call, and then as the
output of the original (g 2) application.
We can think of the recursive evaluation as winding until the base case is reached,
and then unwinding the outputs back to the original application. For this pro-
cedure, the output is not very interesting: no matter what positive number we
apply g to, the eventual result is 1. To solve interesting problems with recursive
procedures, we need to accumulate results as the recursive applications wind
or unwind. Examples 4.1 and 4.2 illustrate recursive procedures that accumu-
late the result during the unwinding process. Example 4.3 illustrates a recursive
procedure that accumulates the result during the winding process.
0! = 1
n! = n (n 1)! for all n > 0
Exercise 4.5. How many different ways are there of choosing an unordered 5-
card hand from a 52-card deck?
This is an instance of the n choose k problem (also known as the binomial
coefficient): how many different ways are there to choose a set of k items from
n items. There are n ways to choose the first item, n 1 ways to choose the
second, . . ., and n k + 1 ways to choose the kth item. But, since the order does
not matter, some of these ways are equivalent. The number of possible ways to
order the k items is k!, so we can compute the number of ways to choose k items
from a set of n items as:
n ( n 1) ( n k + 1) n!
=
k! (n k)!k!
a. Define a procedure choose that takes two inputs, n (the size of the item set)
and k (the number of items to choose), and outputs the number of possible
ways to choose k items from n.
b. Compute the number of possible 5-card hands that can be dealt from a 52-
card deck.
c. [?] Compute the likelihood of being dealt a flush (5 cards all of the same suit).
In a standard 52-card deck, there are 13 cards of each of the four suits. Hint:
divide the number of possible flush hands by the number of possible hands.
Exercise 4.6. Reputedly, when Karl Gauss was in elementary school his teacher
assigned the class the task of summing the integers from 1 to 100 (e.g., 1 + 2 +
3 + + 100) to keep them busy. Being the (future) Prince of Mathematics,
Gauss developed the formula for calculating this sum, that is now known as the
Gauss sum. Had he been a computer scientist, however, and had access to a
Scheme interpreter in the late 1700s, he might have instead defined a recursive
procedure to solve the problem. Define a recursive procedure, gauss-sum, that
takes a number n as its input parameter, and evaluates to the sum of the integers
from 1 to n as its output. For example, (gauss-sum 100) should evaluate to 5050.
Karl Gauss
Exercise 4.7. [?] Define a higher-order procedure, accumulate, that can be used
to make both gauss-sum (from Exercise 4.6) and factorial. The accumulate pro-
cedure should take as its input the function used for accumulation (e.g., for
factorial, + for gauss-sum). With your accumulate procedure, ((accumulate +)
100) should evaluate to 5050 and ((accumulate ) 3) should evaluate to 6. We as-
60 4.3. Recursive Problem Solving
sume the result of the base case is 1 (although a more general procedure could
take that as a parameter).
Hint: since your procedure should produce a procedure as its output, it could
start like this:
(define (accumulate f )
(lambda (n)
(if (= n 1) 1
...
To define the procedure, think about how to combine results from simpler prob-
lems to find the result. For the base case, we need a case so simple we already
know the answer. Consider the case when low and high are equal. Then, there
is only one value to use, and we know the value of the maximum is (f low). So,
the base case is (if (= low high) (f low) . . . ).
How do we make progress towards the base case? Suppose the value of high is
equal to the value of low plus 1. Then, the maximum value is either the value of
(f low) or the value of (f (+ low 1)). We could select it using the bigger procedure
(from Example 3.3): (bigger (f low) (f (+ low 1))). We can extend this to the case
where high is equal to low plus 2:
(bigger (f low) (bigger (f (+ low 1)) (f (+ low 2))))
The second operand for the outer bigger evaluation is the maximum value of the
input procedure between the low value plus one and the high value input. If we
name the procedure we are defining find-maximum, then this second operand
is the result of (find-maximum f (+ low 1) high). This works whether high is
equal to (+ low 1), or (+ low 2), or any other value greater than high.
Putting things together, we have our recursive definition of find-maximum:
(define (find-maximum f low high)
(if (= low high)
(f low)
(bigger (f low) (find-maximum f (+ low 1) high))))
Exercise 4.8. To find the maximum of a function that takes a real number as
Chapter 4. Problems and Procedures 61
its input, we need to evaluate at all numbers in the range, not just the integers.
There are infinitely many numbers between any two numbers, however, so this
is impossible. We can approximate this, however, by evaluating the function at
many numbers in the range.
Define a procedure find-maximum-epsilon that takes as input a function f , a
low range value low, a high range value high, and an increment epsilon, and
produces as output the maximum value of f in the range between low and high
at interval epsilon. As the value of epsilon decreases, find-maximum-epsilon
should evaluate to a value that approaches the actual maximum value.
For example,
(find-maximum-epsilon (lambda (x) ( x ( 5.5 x))) 1 10 1)
evaluates to 7.5. And,
(find-maximum-epsilon (lambda (x) ( x ( 5.5 x))) 1 10 0.01)
evaluates to 7.5625.
Exercise 4.10. [?] Define a find-area procedure that takes as input a function
f , a low range value low, a high range value high, and an increment epsilon,
and produces as output an estimate for the area under the curve produced by
the function f between low and high using the epsilon value to determine how
many regions to evaluate.
Exercise 4.12. [?] Provide a convincing argument why the evaluation of (gcd-
euclid a b) will always finish when the inputs are both positive integers.
One of the earliest known algorithms is a method for computing square roots. It
is known as Herons method after the Greek mathematician Heron of Alexandria
who lived in the first century AD who described the method, although it was also
known to the Babylonians many centuries earlier. Isaac Newton developed a
5 DrRacket provides a built-in procedure gcd that computes the greatest common divisor. We
name our procedure gcd-euclid to avoid a clash with the build-in procedure.
Chapter 4. Problems and Procedures 63
more general method for estimating functions based on their derivatives known
as Netwons method, of which Herons method is a specialization.
Square root is a mathematical function that take a number, a, as input and out-
puts a value x such that x2 = a. For many numbers (including 2), the square
root is irrational, so the best we can hope for with is a good approximation. We
define a procedure find-sqrt that takes the target number as input and outputs
an approximation for its square root.
Herons method works by starting with an arbitrary guess, g0 . Then, with each
iteration, compute a new guess (gn is the nth guess) that is a function of the pre-
vious guess (gn1 ) and the target number (a):
a
g n 1 + gn 1
gn =
2
As n increases gn gets closer and closer to the square root of a.
The definition is recursive since we compute gn as a function of gn1 , so we can
define a recursive procedure that computes Herons method. First, we define a
procedure for computing the next guess from the previous guess and the target:
Heron of
(define (heron-next-guess a g ) (/ (+ g (/ a g )) 2)) Alexandria
Next, we define a recursive procedure to compute the nth guess using Herons
method. It takes three inputs: the target number, a, the number of guesses to
make, n, and the value of the first guess, g.
(define (heron-method a n g )
(if (= n 0)
g
(heron-method a ( n 1) (heron-next-guess a g ))))
To start, we need a value for the first guess. The choice doesnt really matter
the method works with any starting guess (but will reach a closer estimate
quicker if the starting guess is good). We will use 1 as our starting guess. So, we
can define a find-sqrt procedure that takes two inputs, the target number and
the number of guesses to make, and outputs an approximation of the square
root of the target number.
(define (find-sqrt a guesses)
(heron-method a guesses 1))
Herons method converges to a good estimate very quickly:
> (square (find-sqrt 2 0))
1
> (square (find-sqrt 2 1))
2 1/4
> (square (find-sqrt 2 2))
2 1/144
> (square (find-sqrt 2 4))
2 1/221682772224
> (exact->inexact (find-sqrt 2 5))
1.4142135623730951
64 4.4. Evaluating Recursive Applications
in full gory detail, and later review a subset showing the most revealing steps.
Stepping through even a fairly simple evaluation using the evaluation rules is
quite tedious, and not something humans should do very often (thats why we
have computers!) but instructive to do once to understand exactly how an ex-
pression is evaluated.
The evaluation rule for an application expression does not specify the order in
which the subexpressions are evaluated. A Scheme interpreter is free to evaluate
them in any order. Here, we choose to evaluate the subexpressions in the order
that is most readable. The value produced by an evaluation does not depend on
the order in which the subexpressions are evaluated.6
In the evaluation steps, we use typewriter font for uninterpreted Scheme ex-
pressions and sans-serif font to show values. So, 2 represents the Scheme expres-
sion that evaluates to the number 2.
6 This is only true for the subset of Scheme we have defined so far. Once we introduce side effects
and mutation, it is no longer the case, and expressions can produce different results depending on
the order in which they are evaluated.
66 4.4. Evaluating Recursive Applications
Step 1 starts evaluating (factorial 2). The result is found in Step 42. To eval-
uate (factorial 2), we follow the evaluation rules, eventually reaching the body
expression of the if expression in the factorial definition in Step 17. Evaluating
this expression requires evaluating the (factorial 1) subexpression. At Step 17,
the first evaluation is in progress, but to complete it we need the value resulting
from the second recursive application.
Evaluating the second application results in the body expression, ( 1 (factorial
0)), shown for Step 31. At this point, the evaluation of (factorial 2) is stuck in
Chapter 4. Problems and Procedures 67
Evaluation Rule 3, waiting for the value of (factorial 1) subexpression. The eval-
uation of the (factorial 1) application leads to the (factorial 0) subexpression,
which must be evaluated before the (factorial 1) evaluation can complete.
In Step 40, the (factorial 0) subexpression evaluation has completed and pro-
duced the value 1. Now, the (factorial 1) evaluation can complete, producing 1
as shown in Step 41. Once the (factorial 1) evaluation completes, all the subex-
pressions needed to evaluate the expression in Step 17 are now evaluated, and
the evaluation completes in Step 42.
Each recursive application can be tracked using a stack, similarly to process-
ing RTN subnetworks (Section 2.3). A stack has the property that the first item
pushed on the stack will be the last item removedall the items pushed on
top of this one must be removed before this item can be removed. For appli-
cation evaluations, the elements on the stack are expressions to evaluate. To
finish evaluating the first expression, all of its component subexpressions must
be evaluated. Hence, the first application evaluation started is the last one to
finish.
Exercise 4.16. This exercise tests your understanding of the (factorial 2) evalua-
tion.
a. In step 5, the second part of the application evaluation rule, Rule 3(b), is used.
In which step does this evaluation rule complete?
b. In step 11, the first part of the application evaluation rule, Rule 3(a), is used.
In which step is the following use of Rule 3(b) started?
c. In step 25, the first part of the application evaluation rule, Rule 3(a), is used.
In which step is the following use of Rule 3(b) started?
d. To evaluate (factorial 3), how many times would Evaluation Rule 2 be used to
evaluate the name factorial?
e. [?] To evaluate (factorial n) for any positive integer n, how many times would
Evaluation Rule 2 be used to evaluate the name factorial?
Exercise 4.17. For which input values n will an evaluation of (factorial n) even-
tually reach a value? For values where the evaluation is guaranteed to finish,
make a convincing argument why it must finish. For values where the evalua-
tion would not finish, explain why.
4.5.1 Printing
One useful procedure built-in to DrRacket is the display procedure. It takes one
input, and produces no output. Instead of producing an output, it prints out the
value of the input (it will appear in purple in the Interactions window). We can
use display to observe what a procedure is doing as it is evaluated.
For example, if we add a (display n) expression at the beginning of our factorial
procedure we can see all the intermediate calls. To make each printed value
appear on a separate line, we use the newline procedure. The newline procedure
prints a new line; it takes no inputs and produces no output.
(define (factorial n)
(display "Enter factorial: ") (display n) (newline)
(if (= n 0) 1 ( n (factorial ( n 1)))))
Evaluating (factorial 2) produces:
Enter factorial: 2
Enter factorial: 1
Enter factorial: 0
2
The built-in printf procedure makes it easier to print out many values at once.
It takes one or more inputs. The first input is a string (a sequence of characters
enclosed in double quotes). The string can include special ~a markers that print
out values of objects inside the string. Each ~a marker is matched with a corre-
sponding input, and the value of that input is printed in place of the ~a in the
string. Another special marker, ~n, prints out a new line inside the string.
Using printf , we can define our factorial procedure with printing as:
Chapter 4. Problems and Procedures 69
(define (factorial n)
(printf "Enter factorial: an" n)
(if (= n 0) 1 ( n (factorial ( n 1)))))
The display, printf , and newline procedures do not produce output values. In-
stead, they are applied to produce side effects. A side effect is something that side effects
changes the state of a computation. In this case, the side effect is printing in
the Interactions window. Side effects make reasoning about what programs do
much more complicated since the order in which events happen now matters.
We will mostly avoid using procedures with side effects until Chapter 9, but
printing procedures are so useful that we introduce them here.
4.5.2 Tracing
DrRacket provides a more automated way to observe applications of procedures.
We can use tracing to observe the start of a procedure evaluation (including the
procedure inputs) and the completion of the evaluation (including the output).
To use tracing, it is necessary to first load the tracing library by evaluating this
expression:
(require racket/trace)
This defines the trace procedure that takes one input, a constructed procedure
(trace does not work for primitive procedures). After evaluating (trace proc), the
interpreter will print out the procedure name and its inputs at the beginning
of every application of proc and the value of the output at the end of the ap-
plication evaluation. If there are other applications before the first application
finishes evaluating, these will be printed indented so it is possible to match up
the beginning and end of each application evaluation. For example (the trace
outputs are shown in typewriter font),
The trace shows that (factorial 2) is evaluated first; within its evaluation, (facto-
rial 1) and then (factorial 0) are evaluated. The outputs of each of these appli-
cations is lined up vertically below the application entry trace.
The value is the defined as the ratio between the circumference of a circle and
its diameter. One way to calculate the approximate value of is the Gregory-
Leibniz series (which was actually discovered by the Indian mathematician Mad-
hava in the 14th century):
4 4 4 4 4
= + +
1 3 5 7 9
70 4.5. Developing Complex Programs
This summation converges to . The more terms that are included, the closer
the computed value will be to the actual value of .
a. [?] Define a procedure compute-pi that takes as input n, the number of terms
to include and outputs an approximation of computed using the first n
terms of the Gregory-Leibniz series. (compute-pi 1) should evaluate to 4 and
(compute-pi 2) should evaluate to 2 2/3. For higher terms, use the built-in
procedure exact->inexact to see the decimal value. For example,
(exact->inexact (compute-pi 10000))
evaluates (after a long wait!) to 3.1414926535900434.
The Gregory-Leibniz series is fairly simple, but it takes an awful long time to con-
verge to a good approximation for only one digit is correct after 10 terms,
and after summing 10000 terms only the first four digits are correct.
Madhava discovered another series for computing the value of that converges
much more quickly:
1 1 1 1
= 12 (1 + + . . .)
3 3 5 32 7 33 9 34
Madhava computed the first 21 terms of this series, finding an approximation of
that is correct for the first 12 digits: 3.14159265359.
b. [??] Define a procedure cherry-pi that takes as input n, the number of terms
to include and outputs an approximation of computed using the first n
terms of the Madhava series. (Continue reading for hints.)
To define faster-pi, first define two helper functions: faster-pi-helper, that takes
one
input, n, and computes the sum of the first n terms in the series without the
12 factor, and faster-pi-term that takes one input n and computes the value
of the nth term in the series (without alternating the adding and subtracting).
(faster-pi-term 1) should evaluate to 1 and (faster-pi-term 2) should evaluate to
1/9. Then, define faster-pi as:
A standard Nim game starts with three heaps. At each turn, a player removes any
number of stones from any one heap (but may not remove stones from more
than one heap). We can describe the state of a 3-heap game of Nim using three
numbers, representing the number of stones in each heap. For example, the
Thai 21 game starts with the state (21 0 0) (one heap with 21 stones, and two
empty heaps).8
c. What should the first player do to win if the starting state is (2 1 0)?
d. Which player should win if the starting state is (2 2 2)?
e. [?] Which player should win if the starting state is (5 6 7)?
f. [??] Describe a strategy for always winning a winnable game of Nim starting
from any position.9
8 With the standard Nim rules, this would not be an interesting game since the first player can
The final game we consider is the Corner the Queen game invented by Rufus
Isaacs.10 The game is played using a single Queen on a arbitrarily large chess-
board as shown in Figure 4.5.
6
5
4
3
2
1
0 1 2 3 4 5 6 7
On each turn, a player moves the Queen one or more squares in either the left,
down, or diagonally down-left direction (unlike a standard chess Queen, in this
game the queen may not move right, up or up-right). As with the other games,
the last player to make a legal move wins. For this game, once the Queen reaches
the bottom left square marked with the ?, there are no moves possible. Hence,
the player who moves the Queen onto the ? wins the game. We name the squares
using the numbers on the sides of the chessboard with the column number first.
So, the Queen in the picture is on square (4 7).
g. Identify all the starting squares for which the first played to move can win
right away. (Your answer should generalize to any size square chessboard.)
h. Suppose the Queen is on square (2 1) and it is your move. Explain why there
is no way you can avoid losing the game.
i. Given the shown starting position (with the Queen at (4 7), would you rather
be the first or second player?
j. [?] Describe a strategy for winning the game (when possible). Explain from
which starting positions it is not possible to win (assuming the other player
always makes the right move).
k. [?] Define a variant of Nim that is essentially the same as the Corner the
Queen game. (This game is known as Wythoffs Nim.)
4.6 Summary
By breaking problems down into simpler problems we can develop solutions
to complex problems. Many problems can be solved by combining instances
of the same problem on simpler inputs. When we define a procedure to solve a
problem this way, it needs to have a predicate expression to determine when the
base case has been reached, a consequent expression that provides the value for
the base case, and an alternate expression that defines the solution to the given
input as an expression using a solution to a smaller input.
Our general recursive problem solving strategy is: Id rather be an
optimist and a fool
1. Be optimistic! Assume you can solve it. than a pessimist
and right.
2. Think of the simplest version of the problem, something you can already Albert Einstein
For all the programs so far, we have been limited to simple data such as numbers
and Booleans. We call this scalar data since it has no structure. As we saw in scalar
Chapter 1, we can represent all discrete data using just (enormously large) whole
numbers. For example, we could represent the text of a book using only one
(very large!) number, and manipulate the characters in the book by changing the
value of that number. But, it would be very difficult to design and understand
computations that use numbers to represent complex data.
We need more complex data structures to better model structured data. We want
to represent data in ways that allow us to think about the problem we are trying
to solve, rather than the details of how data is represented and manipulated.
This chapter covers techniques for building data structures and for defining pro-
cedures that manipulate structured data, and introduces data abstraction as a
tool for managing program complexity.
5.1 Types
All data in a program has an associated type. Internally, all data is stored just
as a sequence of bits, so the type of the data is important to understand what it
means. We have seen several different types of data already: Numbers, Booleans,
and Procedures (we use initial capital letters to signify a datatype).
A datatype defines a set (often infinite) of possible values. The Boolean datatype datatype
contains the two Boolean values, true and false. The Number type includes the
infinite set of all whole numbers (it also includes negative numbers and rational
numbers). We think of the set of possible Numbers as infinite, even though on
any particular computer there is some limit to the amount of memory available,
and hence, some largest number that can be represented. On any real com-
puter, the number of possible values of any data type is always finite. But, we
can imagine a computer large enough to represent any given number.
The type of a value determines what can be done with it. For example, a Number
can be used as one of the inputs to the primitive procedures +, , and =. A
Boolean can be used as the first subexpression of an if expression and as the
76 5.1. Types
input to the not procedure (not can also take a Number as its input, but for
all Number value inputs the output is false), but cannot be used as the input to
+, , or =.1
A Procedure can be the first subexpression in an application expression. There
are infinitely many different types of Procedures, since the type of a Procedure
depends on its input and output types. For example, recall bigger procedure
from Chapter 3:
(define (bigger a b) (if (> a b) a b))
It takes two Numbers as input and produces a Number as output. We denote
this type as:
The inputs to the procedure are shown on the left side of the arrow. The type of
each input is shown in order, separated by the symbol.2 The output type is
given on the right side of the arrow.
From its definition, it is clear that the bigger procedure takes two inputs from its
parameter list. How do we know the inputs must be Numbers and the output is
a Number?
The body of the bigger procedure is an if expression with the predicate expres-
sion (> a b). This applies the > primitive procedure to the two inputs. The
type of the > procedure is Number Number Boolean. So, for the predi-
cate expression to be valid, its inputs must both be Numbers. This means the
input values to bigger must both be Numbers. We know the output of the bigger
procedure will be a Number by analyzing the consequent and alternate subex-
pressions: each evaluates to one of the input values, which must be a Number.
Starting with the primitive Boolean, Number, and Procedure types, we can build
arbitrarily complex datatypes. This chapter introduces mechanisms for building
complex datatypes by combining the primitive datatypes.
1 The primitive procedure equal? is a more general comparison procedure that can take as in-
puts any two values, so could be used to compare Boolean values. For example, (equal? false false)
evaluates to true and (equal? true 3) is a valid expression that evaluates to false.
2 The notation using to separate input types makes sense if you think about the number of
different inputs to a procedure. For example, consider a procedure that takes two Boolean values as
inputs, so its type is Boolean Boolean Value. Each Boolean input can be one of two possible
values. If we combined both inputs into one input, there would be 2 2 different values needed to
represent all possible inputs.
Chapter 5. Data 77
Exercise 5.2. Define or identify a procedure that has the given type.
a. Number Number Boolean
b. Number Number
c. (Number Number) (Number Number)
(Number Number)
d. Number (Number (Number Number))
5.2 Pairs
The simplest structured data construct is a Pair. We draw a Pair as two boxes, Pair
each containing a value. We call each box of a Pair a cell. Here is a Pair where the
first cell has the value 37 and the second cell has the value 42:
37 42
Scheme provides built-in procedures for constructing a Pair, and for extracting
each cell from a Pair:
These rather unfortunate names come from the original LISP implementation
on the IBM 704. The name cons is short for construct. The name car is short for
Contents of the Address part of the Register and the name cdr (pronounced
could-er) is short for Contents of the Decrement part of the Register. The de-
signers of the original LISP implementation picked the names because of how
pairs could be implemented on the IBM 704 using a single register to store both
parts of a pair, but it is a mistake to name things after details of their implemen-
tation (see Section 5.6). Unfortunately, the names stuck.
We can construct the Pair shown above by evaluating (cons 37 42). DrRacket
displays a Pair by printing the value of each cell separated by a dot: (37 . 42). The
interactions below show example uses of cons, car, and cdr.
> (define mypair (cons 37 42))
> (car mypair)
37
> (cdr mypair)
42
The values in the cells of a Pair can be any type, including other Pairs. For exam-
ple, this definition defines a Pair where each cell of the Pair is itself a Pair:
(define doublepair (cons (cons 1 2) (cons 3 4)))
78 5.2. Pairs
We can use the car and cdr procedures to access components of the doublepair
structure: (car doublepair) evaluates to the Pair (1 . 2), and (cdr doublepair)
evaluates to the Pair (3 . 4).
We can compose multiple car and cdr applications to extract components from
nested pairs:
The last expression produces an error when it is evaluated since car is applied
to the scalar value 1. The car and cdr procedures can only be applied to an
input that is a Pair. Hence, an error results when we attempt to apply car to
a scalar value. This is an important property of data: the type of data (e.g., a
Pair) defines how it can be used (e.g., passed as the input to car and cdr). Every
procedure expects a certain type of inputs, and typically produces an error when
it is applied to values of the wrong type.
We can draw the value of doublepair by nesting Pairs within cells:
1 2 3 4
Drawing Pairs within Pairs within Pairs can get quite difficult, however. For in-
stance, try drawing (cons 1 (cons 2 (cons 3 (cons 4 5)))) this way.
Instead, we us arrows to point to the contents of cells that are not simple values.
This is the structure of doublepair shown using arrows:
1 2 3 4 5
Chapter 5. Data 79
Draw the structure defined by tpair, and give the value of each of the following
expressions.
a. (cdr tpair)
b. (car (car (car tpair)))
c. (cdr (cdr (car tpair)))
d. (car (cdr (cdr tpair)))
Exercise 5.4. Write expressions that extract each of the four elements from
fstruct defined by (define fstruct (cons 1 (cons 2 (cons 3 4)))).
Exercise 5.5. Give an expression that produces the structure shown below.
Exercise 5.6. Convince yourself the definitions of scons, scar, and scdr above
work as expected by following the evaluation rules to evaluate
(scar (scons 1 2))
Exercise 5.7. Show the corresponding definitions of tcar and tcdr that provide
the pair selection behavior for a pair created using tcons defined as:
(define (tcons a b) (lambda (w) (if w b a)))
Exercise 5.9. Another way of thinking of a triple is as a Pair where the first cell is
a Pair and the second cell is a scalar. Provide definitions of make-triple, triple-
first, triple-second, and triple-third for this construct.
5.3 Lists
In the previous section, we saw how to construct arbitrarily large tuples from
Pairs. This way of managing data is not very satisfying since it requires defining
different procedures for constructing and accessing elements of every length tu-
ple. For many applications, we want to be able to manage data of any length
such as all the items in a web store, or all the bids on a given item. Since the
number of components in these objects can change, it would be very painful to
need to define a new tuple type every time an item is added. We need a data
type that can hold any number of items.
This definition almost provides what we need:
An any-uple is a Pair whose second cell is an any-uple.
This seems to allow an any-uple to contain any number of elements. The prob-
lem is we have no stopping point. With only the definition above, there is no
way to construct an any-uple without already having one.
The situation is similar to defining MoreDigits as zero or more digits in Chap-
ter 2, defining MoreExpressions in the Scheme grammar in Chapter 3 as zero or
more Expressions, and recursive composition in Chapter 4.
Recall the grammar rules for MoreExpressions:
MoreExpressions :: Expression MoreExpressions
MoreExpressions :: e
DrRacket will print out the value of null as (). It is also known as the empty list,
since it represents the List containing no elements. The built-in procedure null?
takes one input parameter and evaluates to true if and only if the value of that
parameter is null.
Using null, we can now define a List: List
A List is either (1) null or (2) a Pair whose second cell is a List.
Symbolically, we define a List as:
List :: null
List :: (cons Value List )
82 5.3. Lists
These two rules define a List as a data structure that can contain any number of
elements. Starting from null, we can create Lists of any length:
null evaluates to a List containing no elements.
(cons 1 null) evaluates to a List containing one element.
(cons 1 (cons 2 null)) evaluates to a List containing two elements.
(cons 1 (cons 2 (cons 3 null))) evaluates to a 3-element List.
Scheme provides a convenient procedure, list, for constructing a List. The list
procedure takes zero or more inputs, and evaluates to a List containing those
inputs in order. The following expressions are equivalent to the corresponding
expressions above: (list), (list 1), (list 1 2), and (list 1 2 3).
Lists are just a collection of Pairs, so we can draw a List using the same box and
arrow notation we used to draw structures created with Pairs. Here is the struc-
ture resulting from (list 1 2 3):
1 2 3
There are three Pairs in the List, the second cell of each Pair is a List. For the
third Pair, the second cell is the List null, which we draw as a slash through the
final cell in the diagram.
Table 5.1 summarizes some of the built-in procedures for manipulating Pairs
and Lists.
Exercise 5.10. For each of the following expressions, explain whether or not the
expression evaluates to a List. Check your answers with a Scheme interpreter by
using the list? procedure.
a. null
b. (cons 1 2)
c. (cons null null)
d. (cons (cons (cons 1 2) 3) null)
e. (cdr (cons 1 (cons 2 (cons null null))))
f. (cons (list 1 2 3) 4)
Type Output
cons Value Value Pair a Pair consisting of the two inputs
car Pair Value the first cell of the input Pair
cdr Pair Value the second cell of the input Pair
list zero or more Values List a List containing the inputs
null? Value Boolean true if the input is null, otherwise false
pair? Value Boolean true if the input is a Pair, otherwise false
list? Value Boolean true if the input is a List, otherwise false
Table 5.1. Selected Built-In Scheme Procedures for Lists and Pairs.
Chapter 5. Data 83
2. Think of the simplest version of the problem, something you can already
solve. This is the base case. For lists, this is usually the empty list.
3. Consider how you would solve a big version of the problem by using the
result for a slightly smaller version of the problem. This is the recursive
case. For lists, the smaller version of the problem is usually the rest (cdr)
of the List.
4. Combine the base case and the recursive case to solve the problem.
Next we consider procedures that examine lists by walking through their ele-
ments and producing a scalar value. Section 5.4.2 generalizes these procedures.
In Section 5.4.3, we explore procedures that output lists.
For the recursive case, we need to consider the structure of all lists other than
null. Recall from our definition that a List is either null or (cons Value List ). The
base case handles the null list; the recursive case must handle a List that is a Pair
of an element and a List. The length of this List is one more than the length of
the List that is the cdr of the Pair.
3 Scheme provides a built-in procedure length that takes a List as its input and outputs the num-
ber of elements in the List. Here, we will define our own list-length procedure that does this (without
using the built-in length procedure). As with many other examples and exercises in this chapter, it
is instructive to define our own versions of some of the built-in list procedures.
84 5.4. List Procedures
(define (list-length p)
(if (null? p)
0
(+ 1 (list-length (cdr p)))))
Here are a few example applications of our list-length procedure:
> (list-length null)
0
> (list-length (cons 0 null))
1
> (list-length (list 1 2 3 4))
4
Example 5.2: List Sums and Products
First, we define a procedure that takes a List of numbers as input and produces
as output the sum of the numbers in the input List. As usual, the base case is
when the input is null: the sum of an empty list is 0. For the recursive case, we
need to add the value of the first number in the List, to the sum of the rest of the
numbers in the List.
(define (list-sum p)
(if (null? p) 0 (+ (car p) (list-sum (cdr p)))))
We can define list-product similarly, using in place of +. The base case re-
sult cannot be 0, though, since then the final result would always be 0 since any
number multiplied by 0 is 0. We follow the mathematical convention that the
product of the empty list is 1.
(define (list-product p)
(if (null? p) 1 ( (car p) (list-product (cdr p)))))
Exercise 5.11. Define a procedure is-list? that takes one input and outputs true if
the input is a List, and false otherwise. Your procedure should behave identically
to the built-in list? procedure, but you should not use list? in your definition.
Exercise 5.14. [?] Use list-accumulate to define is-list? (from Exercise 5.11).
The first recursive call is (list-get-element (list 2) 2). The second recursive call is
(list-get-element (list) 1). At this point, n is 1, so the base case is reached and (car
p) is evaluated. But, p is the empty list (which is not a Pair), so an error results.
A better version of list-get-element would provide a meaningful error message
when the requested element is out of range. We do this by adding an if expres-
sion that tests if the input List is null:
(define (list-get-element p n)
(if (null? p)
(error "Index out of range")
(if (= n 1) (car p) (list-get-element (cdr p) ( n 1)))))
The built-in procedure error takes a String as input. The String datatype is a
sequence of characters; we can create a String by surrounding characters with
double quotes, as in the example. The error procedure terminates program ex-
ecution with a message that displays the input value.
defensive Checking explicitly for invalid inputs is known as defensive programming . Pro-
programming gramming defensively helps avoid tricky to debug errors and makes it easier to
understand what went wrong if there is an error.
Exercise 5.16. Define a procedure list-ordered? that takes two inputs, a test
procedure and a List. It outputs true if all the elements of the List are ordered
according to the test procedure. For example, (list-ordered? < (list 1 2 3)) evalu-
ates to true, and (list-ordered? < (list 1 2 3 2)) evaluates to false. Hint: think about
what the output should be for the empty list.
(define (list-square p)
(if (null? p) null
(cons (square (car p))
(list-square (cdr p)))))
We generalize this by making the procedure which is applied to each element an
input. The procedure list-map takes a procedure as its first input and a List as
its second input. It outputs a List whose elements are the results of applying the
input procedure to each element of the input List.4
(define (list-map f p)
(if (null? p) null
(cons (f (car p))
(list-map f (cdr p)))))
We can use list-map to define square-all:
(define (square-all p) (list-map square p))
4 Scheme provides a built-in map procedure. It behaves like this one when passed a procedure
and a single List as inputs, but can also work on more than one List input at a time.
88 5.4. List Procedures
Exercise 5.20. Define a procedure list-remove that takes two inputs: a test pro-
cedure and a List. As output, it produces a List that is a copy of the input List
with all of the elements for which the test procedure evaluates to true removed.
For example, (list-remove (lambda (x) (= x 0)) (list 0 1 2 3)) should evaluates to
the List (1 2 3).
(define (intsto n)
(if (= n 0) null
(list-append (intsto ( n 1)) (list n))))
Although all of these procedures are functionally equivalent (for all valid inputs,
each function produces exactly the same output), the amount of computing
work (and hence the time they take to execute) varies across the implemen-
tations. We consider the problem of estimating the running-times of different
procedures in Part II.
(define (list-flatten p)
(if (null? p) null
(list-append (car p) (list-flatten (cdr p)))))
This flattens a List of Lists into a single List. To completely flatten a deeply
nested List, we use multiple recursive calls as we did with deep-list-sum:
(define (deep-list-flatten p)
(if (null? p) null
(list-append (if (list? (car p))
(deep-list-flatten (car p))
(list (car p)))
(deep-list-flatten (cdr p)))))
Now we can define deep-list-sum as:
(define deep-list-sum (fcompose deep-list-flatten list-sum))
Pascals Triangle (named for Blaise Pascal, although known to many others be-
fore him) is shown below:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
Each number in the triangle is the sum of the two numbers immediately above
and to the left and right of it. The numbers in Pascals Triangle are the coeffi-
cients in a binomial expansion. The numbers of the nth row (where the rows
are numbered starting from 0) are the coefficients of the binomial expansion of
( x + y)n . For example, ( x + y)2 = x2 + 2xy + y2 , so the coefficients are 1 2 1,
matching the third row in the triangle; from the fifth row, ( x + y)4 = x4 + 4x3 y +
6x2 y2 + 4xy3 + y4 . The values in the triangle also match the number of ways to
choose k elements from a set of size n (see Exercise 4.5) the kth number on the
nth row of the triangle gives the number of ways to choose k elements from a set
of size n. For example, the third number on the fifth (n = 4) row is 6, so there are
92 5.6. Data Abstraction
ate data abstractions are designed and implemented, the solution to the prob-
lem often follows readily. This example also uses many of the list procedures
defined earlier in this chapter.
For this exploration, we develop a program to solve the infamous pegboard puz-
zle, often found tormenting unsuspecting diners at pancake restaurants. The
standard puzzle is a one-player game played on a triangular board with fifteen
holes with pegs in all of the holes except one.
The goal is to remove all but one of the pegs by jumping pegs over one another.
A peg may jump over an adjacent peg only when there is a free hole on the other
side of the peg. The jumped peg is removed. The game ends when there are no
possible moves. If there is only one peg remaining, the player wins (according
to the Cracker Barrel version of the game, Leave only oneyoure genius). If
more than one peg remains, the player loses (Leave four or moren youre just
plain eg-no-ra-moose.).
Our goal is to develop a program that finds a winning solution to the pegboard
game from any winnable starting position. We use a brute force approach: try brute force
all possible moves until we find one that works. Brute force solutions only work
on small-size problems. Because they have to try all possibilities they are of-
ten too slow for solving large problems, even on the most powerful computers
imaginable.8
The first thing to think about to solve a complex problem is what datatypes we
need. We want datatypes that represent the things we need to model in our
problem solution. For the pegboard game, we need to model the board with its
pegs. We also need to model actions in the game like a move (jumping over a
peg). The important thing about a datatype is what you can do with it. To design
our board datatype we need to think about what we want to do with a board. In
the physical pegboard game, the board holds the pegs. The important property
we need to observe about the board is which holes on the board contain pegs.
For this, we need a way of identifying board positions. We define a datatype
8 The generalized pegboard puzzle is an example of a class of problems known as NP-Complete.
This means it is not known whether or not any solution exists that is substantially better than the
brute force solution, but it would be extraordinarily surprising (and of momentous significance!) to
find one.
94 5.6. Data Abstraction
for representing positions first, then a datatype for representing moves, and a
datatype for representing the board. Finally, we use these datatypes to define a
procedure that finds a winning solution.
Position. We identify the board positions using row and column numbers:
(1 1)
(2 1) (2 2)
(3 1) (3 2) (3 3)
(4 1) (4 2) (4 3) (4 4)
(5 1) (5 2) (5 3) (5 4) (5 5)
A position has a row and a column, so we could just use a Pair to represent a
position. This would work, but we prefer to have a more abstract datatype so we
can think about a positions row and column, rather than thinking that a position
is a Pair and using the car and cdr procedures to extract the row and column
from the position.
Our Position datatype should provide at least these operations:
make-position: Number Number Position
Creates a Position representing the row and column given by the input
numbers.
position-get-row: Position Number
Outputs the row number of the input Position.
position-get-column: Position Number
Outputs the column number of the input Position.
Since the Position needs to keep track of two numbers, a natural way to imple-
ment the Position datatype is to use a Pair. A more defensive implementation
tagged list of the Position datatype uses a tagged list. With a tagged list, the first element
of the list is a tag denoting the datatype it represents. All operations check the
tag is correct before proceeding. We can use any type to encode the list tag, but
it is most convenient to use the built-in Symbol type. Symbols are a quote ()
followed by a sequence of characters. The important operation we can do with
a Symbol, is test whether it is an exact match for another symbol using the eq?
procedure.
We define the tagged list datatype, tlist, using the list-get-element procedure
from Example 5.3:
(define (make-tlist tag p) (cons tag p))
(define (tlist-get-tag p) (car p))
(define (tlist-get-element tag p n)
(if (eq? (tlist-get-tag p) tag )
(list-get-element (cdr p) n)
(error (format "Bad tag: a (expected a)"
(tlist-get-tag p) tag ))))
The format procedure is a built-in procedure similar to the printf procedure
described in Section 4.5.1. Instead of printing as a side effect, format produces
a String. For example, (format "list: a number: a." (list 1 2 3) 123) evaluates to the
String "list: (1 2 3) number: 123.".
Chapter 5. Data 95
Move. A move involves three positions: where the jumping peg starts, the po-
sition of the peg that is jumped and removed, and the landing position. One
possibility would be to represent a move as a list of the three positions. A better
option is to observe that once any two of the positions are known, the third po-
sition is determined. For example, if we know the starting position and the land-
ing position, we know the jumped peg is at the position between them. Hence,
we could represent a jump using just the starting and landing positions.
Another possibility is to represent a jump by storing the starting Position and the
direction. This is also enough to determine the jumped and landing positions.
This approach avoids the difficulty of calculating jumped positions. To do it, we
first design a Direction datatype for representing the possible move directions.
Directions have two components: the change in the column (we use 1 for right
and 1 for left), and the change in the row (1 for down and 1 for up).
We implement the Direction datatype using a tagged list similarly to how we
defined Position:
(define (make-direction right down)
(make-tlist Direction (list right down)))
(define (direction-get-horizontal dir) (tlist-get-element Direction dir 1))
(define (direction-get-vertical dir) (tlist-get-element Direction dir 2))
The Move datatype is defined using the starting position and the jump direction:
(define (make-move start direction)
(make-tlist Move (list start direction)))
(define (move-get-start move) (tlist-get-element Move move 1))
(define (move-get-direction move) (tlist-get-element Move move 2))
We also define procedures for getting the jumped and landing positions of a
move. The jumped position is the result of moving one step in the move di-
rection from the starting position. So, it will be useful to define a procedure that
takes a Position and a Direction as input, and outputs a Position that is one step
in the input Direction from the input Position.
96 5.6. Data Abstraction
keep a List of the Positions of the empty holes on the board. Yet another pos-
sibility is to keep a List of Lists, where each List corresponds to one row on the
board. The elements in each of the Lists are Booleans representing whether or
not there is a peg at that position. The good thing about data abstraction is we
could pick any of these representations and change it to a different representa-
tion later (for example, if we needed a more efficient board implementation). As
long as the procedures for implementing the Board are updated the work with
the new representation, all the code that uses the board abstraction should con-
tinue to work correctly without any changes.
We choose the third option and represent a Board using a List of Lists where
each element of the inner lists is a Boolean indicating whether or not the cor-
responding position contains a peg. So, make-board evaluates to a List of Lists,
where each element of the List contains the row number of elements and all the
inner elements are true (the initial board is completely full of pegs). First, we de-
fine a procedure make-list-of-constants that takes two inputs, a Number, n, and
a Value, val. The output is a List of length n where each element has the value
val.
(define (make-list-of-constants n val)
(if (= n 0) null (cons val (make-list-of-constants ( n 1) val))))
To make the initial board, we use make-list-of-constants to make each row of the
board. As usual, a recursive problem solving strategy works well: the simplest
board is a board with zero rows (represented as the empty list); for each larger
board, we add a row with the right number of elements.
The tricky part is putting the rows in order. This is similar to the problem we
faced with intsto, and a similar solution using append-list works here:
(define (make-board rows)
(if (= rows 0) null
(list-append (make-board ( rows 1))
(list (make-list-of-constants rows true)))))
Evaluating (make-board 3) produces ((true) (true true) (true true true)).
The board-rows procedure takes a Board as input and outputs the number of
rows on the board.
(define (board-rows board) (length board))
The board-valid-position? indicates if a Position is on the board. A position is
valid if its row number is between 1 and the number of rows on the board, and
its column numbers is between 1 and the row number.
(define (board-valid-position? board pos)
(and (>= (position-get-row pos) 1) (>= (position-get-column pos) 1)
(<= (position-get-row pos) (board-rows board))
(<= (position-get-column pos) (position-get-row pos))))
We need a way to check if a Board represents a winning solution (that is, contains
only one peg). We implement a more general procedure to count the number of
pegs on a board first. Our board representation used true to represent a peg.
To count the pegs, we first map the Boolean values used to represent pegs to
1 if there is a peg and 0 if there is no peg. Then, we use sum-list to count the
98 5.6. Data Abstraction
number of pegs. Since the Board is a List of Lists, we first use list-flatten to put
all the pegs in a single List.
(define (board-number-of-pegs board)
(list-sum
(list-map (lambda (peg ) (if peg 1 0)) (list-flatten board))))
A board is a winning board if it contains exactly one peg:
(define (board-is-winning? board)
(= (board-number-of-pegs board) 1))
The board-contains-peg? procedure takes a Board and a Position as input, and
outputs a Boolean indicating whether or not that Position contains a peg. To im-
plement board-contains-peg? we need to find the appropriate row in our board
representation, and then find the element in its list corresponding to the posi-
tions column. The list-get-element procedure (from Example 5.3) does exactly
what we need. Since our board is represented as a List of Lists, we need to use it
twice: first to get the row, and then to select the column within that row:
(define (board-contains-peg? board pos)
(list-get-element (list-get-element board (position-get-row pos))
(position-get-column pos)))
Defining procedures for adding and removing pegs from the board is more com-
plicated. Both of these procedures need to make a board with every row identi-
cal to the input board, except the row where the peg is added or removed. For
that row, we need to replace the corresponding value. Hence, instead of defining
separate procedures for adding and removing we first implement a more general
board-replace-peg procedure that takes an extra parameter indicating whether
a peg should be added or removed at the selected position.
First we consider the subproblem of replacing a peg in a row. The procedure
row-replace-peg takes as input a List representing a row on the board and a
Number indicating the column where the peg should be replaced. We can de-
fine row-replace-peg recursively: the base case is when the modified peg is at the
beginning of the row (the column number is 1); in the recursive case, we copy
the first element in the List, and replace the peg in the rest of the list. The third
parameter indicates if we are adding or removing a peg. Since true values rep-
resent holes with pegs, a true value indicates that we are adding a peg and false
means we are removing a peg.
(define (row-replace-peg pegs col val)
(if (= col 1)
(cons val (cdr pegs))
(cons (car pegs) (row-replace-peg (cdr pegs) ( col 1) val))))
To replace the peg on the board, we use row-replace-peg to replace the peg on
the appropriate row, and keep all the other rows the same.
(define (board-replace-peg board row col val)
(if (= row 1)
(cons (row-replace-peg (car board) col val) (cdr board))
(cons (car board) (board-replace-peg (cdr board) ( row 1) col val))))
Both board-add-peg and board-remove-peg can be defined simply using board-
Chapter 5. Data 99
remove-peg . They first check if the operation is valid (adding is valid only if the
selected position does not contain a peg, removing is valid only if the selected
position contains a peg), and then use board-replace-peg to produce the modi-
fied board:
(define (board-add-peg board pos)
(if (board-contains-peg? board pos)
(error (format "Board already contains peg at position: a" pos))
(board-replace-peg board (position-get-row pos)
(position-get-column pos) true)))
(define (board-remove-peg board pos)
(if (not (board-contains-peg? board pos))
(error (format "Board does not contain peg at position: a" pos))
(board-replace-peg board (position-get-row pos)
(position-get-column pos) false)))
We can now define a procedure that models making a move on a board. Making
a move involves removing the jumped peg and moving the peg from the start-
ing position to the landing position. Moving the peg is equivalent to removing
the peg from the starting position and adding a peg to the landing position, so
the procedures we defined for adding and removing pegs can be composed to
model making a move. We add a peg landing position to the board that results
from removing the pegs in the starting and jumped positions:
(define (board-execute-move board move)
(board-add-peg
(board-remove-peg
(board-remove-peg board (move-get-start move))
(move-get-jumped move))
(move-get-landing move)))
Finding Valid Moves. Now that we can model the board and simulate making
jumps, we are ready to develop the solution. At each step, we try all valid moves
on the board to see if any move leads to a winning position (that is, a position
with only one peg remaining). So, we need a procedure that takes a Board as
its input and outputs a List of all valid moves on the board. We break this down
into the problem of producing a list of all conceivable moves (all moves in all
directions from all starting positions on the board), filtering that list for moves
that stay on the board, and then filtering the resulting list for moves that are legal
(start at a position containing a peg, jump over a position containing a peg, and
land in a position that is an empty hole).
First, we generate all conceivable moves by creating moves starting from each
position on the board and moving in all possible move directions. We break this
down further: the first problem is to produce a List of all positions on the board.
We can generate a list of all row numbers using the intsto procedure (from Ex-
ample 5.8). To get a list of all the positions, we need to produce a list of the
positions for each row. We do this by mapping each row to the corresponding
list:
100 5.6. Data Abstraction
The first five chapters focused on ways to use language to describe procedures.
Although finding ways to describe procedures succinctly and precisely would
be worthwhile even if we did not have machines to carry out those procedures,
the tremendous practical value we gain from being able to describe procedures
comes from the ability of computers to carry out those procedures astoundingly
quickly, reliably, and inexpensively. As a very rough approximation, a typical
laptop gives an individual computing power comparable to having every living
human on the planet working for you without ever making a mistake or needing
a break.
This chapter introduces computing machines. Computers are different from
other machines in two key ways:
The next section gives a brief history of computing machines, from prehistoric
calculating aids to the design of the first universal computers. Section 6.2 ex-
plains how machines can implement logic. Section 6.3 introduces a simple ab-
stract model of a computing machine that is powerful enough to carry out any
algorithm.
We provide only a very shallow introduction to how machines can implement
computations. Our primary goal is not to convey the details of how to design
and build an efficient computing machine (although that is certainly a worthy
goal that is often pursued in later computing courses), but to gain sufficient un-
derstanding of the properties nearly all conceivable computing machines share
to be able to predict properties about the costs involved in carrying out a par-
ticular procedure. The following chapters use this to reason about the costs of
various procedures. In Chapter 12, we use it to reason about the range of prob-
lems that can and cannot be solved by any mechanical computing machine.
106 6.1. History of Computing Machines
1 Not all human cultures use base ten number systems. For example, many cultures including the
Maya and Basque adopted base twenty systems counting both fingers and toes. This was natural in
warm areas, where typical footwear left the toes uncovered.
Chapter 6. Machines 107
metic operation, record the result, and reset the machine for the next calcula-
tion.
The big breakthrough was the conceptual leap of programmability. A machine
is programmable if its inputs not only control the values it operates on, but the
operations it performs.
The first programmable computing machine was envisioned (but never suc-
cessfully built) in the 1830s by Charles Babbage. Babbage was born in London
in 1791 and studied mathematics at Cambridge. In the 1800s, calculations were
done by looking up values in large books of mathematical and astronomical ta-
bles. These tables were computed by hand, and often contained errors. The
calculations were especially important for astronomical navigation, and when
the values were incorrect a ship would miscalculate its position at sea (some-
times with tragic consequences). We got nothing for
our 17,000 but Mr.
Babbage sought to develop a machine to mechanize the calculations to compute Babbages
these tables. Starting in 1822, he designed a steam-powered machine known grumblings. We
should at least have
as the Difference Engine to compute polynomials needed for astronomical cal- had a clever toy for
culations using Newtons method of successive differences (a generalization of our money.
Herons method from Exploration 4.1). The Difference Engine was never fully Richard Sheepshanks,
Letter to the Board of
completed. but led Babbage to envision a more general calculating machine. Visitors of the
Greenwich Royal
This new machine, the Analytical Engine, designed between 1833 and 1844, was Observatory, 1854
the first general-purpose computer envisioned. It was designed so that it could
be programmed to perform any calculation. One breakthrough in Babbages de-
sign was to feed the machines outputs back into its inputs. This meant the en-
gine could perform calculations with an arbitrary number of steps by cycling
outputs back through the machine.
The Analytical Engine was programmed using punch cards, based on the cards
that were used by Jacquard looms. Each card could describe an instruction such
as loading a number into a variable in the store, moving values, performing
arithmetic operations on the values in the store, and, most interestingly, jump-
ing forward and backwards in the instruction cards. The Analytical Engine sup-
ported conditional jumps where the jump would be taken depending on the Analytical Engine
Science Museum, London
state of a lever in the machine (this is essentially a simple form of the if expres-
sion).
In 1842, Charles Babbage visited Italy and described the Analytical Engine to
Luigi Menabrea, an Italian engineer, military officer, and mathematician who
would later become Prime Minister of Italy. Menabrea published a description
of Babbages lectures in French. Ada Augusta Byron King (also known as Ada,
Countess of Lovelace) translated the article into English.
In addition to the translation, Ada added a series of notes to the article. The
notes included a program to compute Bernoulli numbers, the first detailed pro-
gram for the Analytical Engine. Ada was the first to realize the importance and
interest in creating the programs themselves, and envisioned how programs
could be used to do much more than just calculate mathematical functions.
This was the first computer program ever described, and Ada is recognized as
the first computer programmer.
Ada
Despite Babbages design, and Adas vision, the Analytical Engine was never com-
108 6.2. Mechanizing Logic
pleted. It is unclear whether the main reason for the failure to build a working
Analytical Engine was due to limitations of the mechanical components avail-
able at the time, or due to Babbages inability to work with his engineer collabo-
On two occasions I rator or to secure continued funding.
have been asked by
members of The first working programmable computers would not appear for nearly a hun-
Parliament, Pray, dred years. Advances in electronics enabled more reliable and faster compo-
Mr. Babbage, if you
put into the
nents than the mechanical components used by Babbage, and the desperation
machine wrong brought on by World War II spurred the funding and efforts that led to working
figures, will the general-purpose computing machines.
right answers come
out? I am not able The remaining conceptual leap is to treat the program itself as data. In Bab-
rightly to bages Analytical Engine, the program is a stack of cards and the data are num-
apprehend the kind
of confusion of ideas bers stored in the machine. The machine cannot alter its own program.
that could provoke
such a question. The idea of treating the program as just another kind of data the machine can
Charles Babbage process was developed in theory by Alan Turing in the 1930s (Section 6.3 of
this chapter describes his model of computing), and first implemented by the
Manchester Small-Scale Experimental Machine (built by a team at Victoria Uni-
versity in Manchester) in 1948.
This computer (and all general-purpose computers in use today) stores the pro-
gram itself in the machines memory. Thus, the computer can create new pro-
grams by writing into its own memory. This power to change its own program is
what makes stored-program computers so versatile.
Exercise 6.1. Babbages design for the Analytical Engine called for a store hold-
ing 1000 variables, each of which is a 50-digit (decimal) number. How many bits
could the store of Babbages Analytical Engine hold?
A B (and A B)
false false false
true false false
false true false
true true true
We design a machine that implements the function described by the truth ta-
ble: if both inputs are true (represented by full bottles of wine in our machine),
the output should be true; if either input is false, the output should be false (an
empty bottle). One way to do this is shown in Figure 6.1. Both inputs pour into
a basin. The output nozzle is placed at a height corresponding to one bottle of
wine in the collection basin, so the output bottle will fill (representing true), only
if both inputs are true.
The design in Figure 6.1 would probably not work very well in practice. Some
of the wine is likely to spill, so even when both inputs are true the output might
not be a full bottle of wine. What should a 43 full bottle of wine represent? What
about a bottle that is half full?
2 Scheme provides a special form and that performs the same function as the logical and function.
It is a special form, though, since the second input expression is not evaluated unless the first input
expression evaluates to true.
110 6.2. Mechanizing Logic
A B (or A B)
false false false
true false true
false true true
true true true
Try to invent your own design for a machine that computes the or function be-
fore looking at one solution in Figure 6.2(a).
Implementing not. The output of the not function is the opposite of the value
of its input:
A (not A)
false true
true false
3 Scheme provides a special form or that implements the logical or function, similarly to the and
special form. If the first input evaluates to true, the second input is not evaluated and the value of
the or expression is true.
Chapter 6. Machines 111
It is not possible to produce a logical not without some other source of wine; it
needs to create wine (to represent true) when there is none input (representing
false). To implement the not function, we need the notion of a source current
and a clock. The source current injects a bottle of wine on each clock tick. The
clock ticks periodically, on each operation. The inputs need to be set up before
the clock tick. When the clock ticks, a bottle of wine is sent through the source
current, and the output is produced. Figure 6.2(b) shows one way to implement
the not function.
Composing logical functions also allows us to build new logical functions. Con-
sider the xor (exclusive or) function that takes two inputs, and has output true
when exactly one of the inputs is true:
A B (xor A B)
false false false
true false true
false true true
true true false
A B (nand A B)
false false true
true false true
false true true
true true false
Chapter 6. Machines 113
All Boolean logic functions can be implemented using just the nand function.
One way to prove this is to show how to build all logic functions using just nand.
For example, we can implement not using nand where the one input to the not
function is used for both inputs to the nand function:
(not A) (nand A A)
Now that we have shown how to implement not using nand, it is easy to see how
to implement and using nand:
Implementing or is a bit trickier. Recall that A or B is true if any one of the inputs
is true. But, A nand B is true if both inputs are false, and false if both inputs are
true. To compute or using only nand functions, we need to invert both inputs:
To complete the proof, we would need to show how to implement all the other
Boolean logic functions. We omit the details here, but leave some of the other
functions as exercises. The universality of the nand function makes it very useful
for implementing computing devices. Trillions of nand gates are produced in
silicon every day.
Exercise 6.2. Define a Scheme procedure, logical-or, that takes two inputs and
outputs the logical or of those inputs.
Exercise 6.3. What is the meaning of composing not with itself? For example,
(not (not A)).
Exercise 6.4. Define the xor function using only nand functions.
Exercise 6.6. [?] The digital abstraction works fine as long as actual values stay
close to the value they represent. But, if we continue to compute with the out-
puts of functions, the actual values will get increasingly fuzzy. For example, if
the inputs to the and3 function in Figure 6.3 are initially all 43 full bottles (which
should be interpreted as true), the basin for the first and function will fill to 1 21 ,
so only 12 bottle will be output from the first and. When combined with the third
input, the second basin will contain 1 14 bottles, so only 14 will spill into the output
bottle. Thus, the output will represent false, even though all three inputs repre-
sent true. The solution to this problem is to use an amplifier to restore values to
114 6.2. Mechanizing Logic
their full representations. Design a wine machine amplifier that takes one input
and produces a strong representation of that input as its output. If that input
represents true (any value that is half full or more), the amplifier should output
true, but with a strong, full bottle representation. If that input represents false
(any value that is less than half full), the amplifier should output a strong false
value (completely empty).
6.2.3 Arithmetic
Not only is the nand function complete for Boolean logical functions, it is also
enough to implement all discrete arithmetic functions. First, consider the prob-
lem of adding two one-bit numbers.
There are four possible pairs of inputs:
A B r1 r0
0 + 0 = 0 0
0 + 1 = 0 1
1 + 0 = 0 1
1 + 1 = 1 0
We can compute each of the two output bits as a logical function of the two input
bits. The right output bit, r0 , is 1 if exactly one of the input bits is 1:
r0 = (xor A B)
The left output bit, r1 , is 0 for all inputs except when both inputs are 1:
r1 = (and A B)
Since we have already seen how to implement and, or, xor, and not using only
nand functions, this means we can implement a one-bit adder using only nand
functions.
Adding larger numbers requires more logical functions. Consider adding two
n-bit numbers:
a n 1 a n 2 a1 a0
+ bn 1 bn 2 b1 b0
= rn r n 1 r n 2 r1 r0
The elementary school algorithm for adding decimal numbers is to sum up the
digits from right to left. If the result in one place is more than one digit, the
additional tens are carried to the next digit. We use ck to represent the carry digit
in the kth column.
Chapter 6. Machines 115
cn c n 1 c n 2 c1
a n 1 a n 2 a1 a0
+ bn 1 bn 2 b1 b0
= rn r n 1 r n 2 r1 r0
As we move left through the digits, the terms get increasingly complex. But,
for any number of digits, we can always find functions for computing the result
bits using only logical functions on the input bits. Hence, we can implement
addition for any length binary numbers using only nand functions.
We can also implement multiplication, subtraction, and division using only nand
functions. We omit the details here, but the essential approach of breaking down
our elementary school arithmetic algorithms into functions for computing each
output bit works for all of the arithmetic operations.
Exercise 6.8. Show how to compute the result bits for binary multiplication of
two 2-bit inputs using only logical functions.
Exercise 6.9. [?] Show how to compute the result bits for binary multiplication
of two inputs of any length using only logical functions.
distance, and record the mouse movements as discrete numbers of move units
and directions. Richer input devices like a camera or microphone can also pro-
duce discrete output by discretizing the input using a process similar to the im-
age storage in Chapter 1. So, the information produced by any input device can
be represented by a sequence of bits. The virtual
shopping spree was
For real input devices, the time an event occurs is often crucial. When playing a a first for the
video game, it does not just matter that the mouse button was clicked, it matters President who has a
reputation for being
a great deal when the click occurs. How can we model inputs where time matters technologically
using just our simple sequence of bits? challenged. But
White House
One way would be to divide time into discrete quanta and encode the input as sources insist that
zero or one events in each quanta. A more efficient way would be to add a times- the First Shopper
tamp to each input. The timestamps are just numbers (e.g., the number of mil- used his own laptop
and even knew
liseconds since the start time), so can be written down just as sequences of bits. how to use the
mouse.
Thus, we can model a wide range of complex input devices with just a finite BusinessWeek,
sequence of bits. The input must be finite, since our model computer needs all 22 December 1999
the input before it starts processing. This means our model is not a good model
for computations where the input is infinite, such as a web server intended to
keep running and processing new inputs (e.g., requests for a web page) forever.
In practice, though, this isnt usually a big problem since we can make the input
finite by limiting the time the server is running in the model.
A finite sequence of bits can be modeled using a long, narrow, tape that is di-
vided into squares, where each square contains one bit of the input.
Modeling output. Output from computers effects the physical world in lots of
very complex ways: displaying images on a screen, printing text on a printer,
sending an encoded web page over a network, sending an electrical signal to an
anti-lock brake to increase the braking pressure, etc.
We dont attempt to model the physical impact of computer outputs; that would
be far too complicated, but it is also one step beyond modeling the computa-
tion itself. Instead, we consider just the information content of the output. The
information in a picture is the same whether it is presented as a sequence of bits
or an image projected on a screen, its just less pleasant to look at as a sequence
of bits. So, we can model the output just like we modeled the input: a sequence
of bits written on a tape divided into squares.
Modeling processing. Our processing model should be able to model every
possible mechanical procedure since we want to model a universal computer,
but should be as simple as possible.
One thing our model computer needs is a way to keep track of what it is do-
ing. We can think of this like scratch paper: a human would not be able to do a
long computation without keeping track of intermediate values on scratch pa-
per, and a computer has the same need. In Babbages Analytical Engine, this
was called the store, and divided into a thousand variables, each of which could
store a fifty decimal digit number. In the Apollo Guidance Computer, the work-
ing memory was divided into banks, each bank holding 1024 words. Each word
was 15 bits (plus one bit for error correction). In current 32-bit processors, such
as the x86, memory is divided into pages, each containing 1024 32-bit words.
118 6.3. Modeling Computing
For our model machine, we dont want to have arbitrary limits on the amount of
working storage. So, we model the working storage with an infinitely long tape.
Like the input and output tapes, it is divided into squares, and each square can
contain one symbol. For our model computer, it is useful to think about having
an infinitely long tape, but of course, no real computer has infinite amounts of
working storage. We can, however, imagine continuing to add more memory to
a real computer as needed until we have enough to solve a given problem, and
adding more if we need to solve a larger problem.
Our model now involves separate tapes for input, output, and a working tape.
We can simplify the model by using a single tape for all three. At the beginning of
the execution, the tape contains the input (which must be finite). As processing
is done, the input is read and the tape is used as the working tape. Whatever is
on the tape and the end of the execution is the output.
We also need a way for our model machine to interface with the tape. We imag-
ine a tape head that contacts a single square on the tape. On each processing
step, the tape head can read the symbol in the current square, write a symbol in
the current square, and move one square either left or right.
The final thing we need is a way to model actually doing the processing. In our
model, this means controlling what the tape head does: at each step, it needs to
decide what to write on the tape, and whether to move left or right, or to finish
the execution.
In early computing machines, processing meant performing one of the basic
arithmetic operations (addition, subtraction, multiplication, or division). We
dont want to have to model anything as complex as multiplication in our model
machine, however. The previous section showed how addition and other arith-
metic operations can be built from simpler logical operations. To carry out a
complex operation as a composition of simple operations, we need a way to
keep track of enough state to know what to do next. The machine state is just
a number that keeps track of what the machine is doing. Unlike the tape, it is
limited to a finite number. There are two reasons why the machine state num-
ber must be finite: first, we need to be able to write down the program for the
machine by explaining what it should do in each state, which would be difficult
if there were infinitely many states.
We also need rules to control what the tape head does. We can think of each
rule as a mapping from the current observed state of the machine to what to
do next. The input for a rule is the symbol in the current tape square and the
current state of the machine; the output of each rule is three things: the symbol
to write on the current tape square, the direction for the tape head to move (left,
right, or halt), and the new machine state. We can describe the program for the
machine by listing the rules. For each machine state, we need a rule for each
possible symbol on the tape.
keeps track of its internal state, and follows rules matching the current state and
current tape square to determine what to do next.
Turings model is by far the most widely used model for computers today. Tur-
ing developed this model in 1936, before anything resembling a modern com-
puter existed. Turing did not develop his model as a model of an automatic
computer, but instead as a model for what could be done by a human following
mechanical rules. He devised the infinite tape to model the two-dimensional
graph paper students use to perform arithmetic. He argued that the number
of machine states must be limited by arguing that a human could only keep a
limited amount of information in mind at one time.
Turings model is equivalent to the model we described earlier, but instead of
using only bits as the symbols on the tape, Turings model uses members of any
finite set of symbols, known as the alphabet of the tape. Allowing the tape al-
phabet to contain any set of symbols instead of just the two binary digits makes
it easier to describe a Turing Machine that computes a particular function, but
does not change the power of the model. That means, every computation that
could be done with a Turing Machine using any alphabet set, could also be done
by some Turing Machine using only the binary digits.
We could show this by describing an algorithm that takes in a description of a
Turing Machine using an arbitrarily large alphabet, and produces a Turing Ma-
chine that uses only two symbols to simulate the input Turing Machine. As we
saw in Chapter 1, we can map each of the alphabet symbols to a finite sequence
of binary digits.
Mapping the rules is more complex: since each original input symbol is now
spread over several squares, we need extra states and rules to read the equiva-
lent of one original input. For example, suppose our original machine uses 16
alphabet symbols, and we map each symbol to a 4-bit sequence. If the original
machine used a symbol X, which we map to the sequence of bits 1011, we would
need four states for every state in the original machine that has a rule using X as
input. These four states would read the 1, 0, 1, 1 from the tape. The last state
now corresponds to the state in the original machine when an X is read from the
tape. To follow the rule, we also need to use four states to write the bit sequence
corresponding to the original write symbol on the tape. Then, simulating mov-
ing one square left or right on the original Turing Machine, now requires moving
four squares, so requires four more states. Hence, we may need 12 states for each
transition rule of the original machine, but can simulate everything it does using
only two symbols.
120 6.3. Modeling Computing
universal The Turing Machine model is a universal computing machine. This means every
computing machine algorithm can be implemented by some Turing Machine. Chapter 12 explores
more deeply what it means to simulate every possible Turing Machine and ex-
plores the set of problems that can be solved by a Turing Machine.
Any physical machine has a limited amount of memory. If the machine does
not have enough space to store a trillion bits, there is no way it can do a com-
putation whose output would exceed a trillion bits. Nevertheless, the simplicity
and robustness of the Turing Machine model make it a useful way to think about
computing even if we cannot build a truly universal computing machine.
Turings model has proven to be remarkably robust. Despite being invented
before anything resembling a modern computer existed, nearly every comput-
ing machine ever imagined or built can be modeled well using Turings simple
model. The important thing about the model is that we can simulate any com-
puter using a Turing Machine. Any step on any computer that operates using
standard physics and be simulated with a finite number of steps on a Turing
Machine. This means if we know how many steps it takes to solve some prob-
lem on a Turing Machine, the number of steps it takes on any other machine is
at most some multiple of that number. Hence, if we can reason about the num-
ber of steps required for a Turing Machine to solve a given problem, then we can
make strong and general claims about the number of steps it would take any
standard computer to solve the problem. We will show this more convincingly
in Chapter 12, but for now we assert it, and use it to reason about the cost of
executing various procedures in the following chapter.
ues to the right until it finds a closing parenthesis (this is the start state); Look-
ForOpen, in which the machine continues to the left until it finds the balancing
open parenthesis; and CheckTape, in which the machine checks there are no un-
balanced open parentheses on the tape starting from the right end of the tape
and moving towards the left end. The full rules are shown in Figure 6.5.
Another way to depict a Turing Machine is to show the states and rules graphi-
cally. Each state is a node in the graph. For each rule, we draw an edge on the
graph between the starting state and the next state, and label the edge with the
read and write tape symbols (separated by a /), and move direction.
Figure 6.6 shows the same Turing Machine as a state graph. When reading a
symbol in a given state produces an error (such as when a ) is encountered in
the LookForOpen state), it is not necessary to draw an edge on the graph. If there
is no outgoing edge for the current read symbol for the current state in the state
graph, execution terminates with an error.
)/X, X/X,
LookFor
LookFor Open
Closing #/0,Halt
(/X,
(/(, X/X,
X/X, CheckTape
#/#,
#/1,Halt Halt
(/0,Halt
Figure 6.6. Checking parentheses Turing Machine.
122 6.3. Modeling Computing
Exercise 6.10. Follow the rules to simulate the checking parentheses Turing
Machine on each input (assume the beginning and end of the input are marked
with a #):
a. )
b. ()
c. empty input
d. (()(()))()
e. (()))(()
Exercise 6.11. [?] Design a Turing Machine for adding two arbitrary-length bi-
nary numbers. The input is of the form an1 . . . a1 a0 + bm1 . . . b1 b0 (with # mark-
ers at both ends) where each ak and bk is either 0 or 1. The output tape should
contain bits that represent the sum of the two inputs.
Alan Turing was born in London in 1912, and developed his computing model
while at Cambridge in the 1930s. He developed the model to solve a famous
problem posed by David Hilbert in 1928. The problem, known as the Entschei-
dungsproblem (German for decision problem) asked for an algorithm that could
determine the truth or falsehood of a mathematical statement. To solve the
problem, Turing first needed a formal model of an algorithm. For this, he in-
vented the Turing Machine model described above, and defined an algorithm
as any Turing Machine that is guaranteed to eventually halt on any input. With
Alan Turing the model, Turing was able to show that there are some problems that cannot
Image from Bletchley Park Ltd.
be solved by any algorithm. We return to this in Chapter 12 and explain Turings
proof and examples of problems that cannot be solved.
After publishing his solution to the Entscheidungsproblem in 1936, Turing went
to Princeton and studied with Alonzo Church (inventor of the Lambda calcu-
lus, on which Scheme is based). With the start of World War II, Turing joined
the highly secret British effort to break Nazi codes at Bletchley Park. Turing
was instrumental in breaking the Enigma code which was used by the Nazis
to communicate with field units and submarines. Turing designed an electro-
mechanical machine known as a bombe for searching possible keys to decrypt
Enigma-encrypted messages. The machines used logical operations to search
Bombe the possible rotor settings on the Enigma to find the settings that were most
Rebuilt at Bletchley Park
likely to have generated an intercepted encrypted message. Bletchley Park was
able to break thousands of Enigma messages during the war. The Allies used the
knowledge gained from them to avoid Nazi submarines and gain a tremendous
tactical advantage.
After the war, Turing continued to make both practical and theoretical contri-
butions to computer science. Among other things, he worked on designing
general-purpose computing machines and published a paper (Intelligent Ma-
chinery) speculating on the ability of computers to exhibit intelligence. Turing
introduced a test for machine intelligence (now known as the Turing Test) based
on a machines ability to impersonate a human and speculated that machines
Chapter 6. Machines 123
would be able to pass the test within 50 years (that is, by the year 2000). Tur-
ing also studied morphogenesis (how biological systems grow) including why
Fibonacci numbers appear so often in plants.
In 1952, Turings house was broken into, and Turing reported the crime to the
police. The investigation revealed that Turing was a homosexual, which at the
time was considered a crime in Britain. Turing did not attempt to hide his homo-
sexuality, and was convicted and given a choice between serving time in prison
and taking hormone treatments. He accepted the treatments, and has his se-
curity clearance revoked. In 1954, at the age of 41, Turing was found dead in
an apparent suicide, with a cynide-laced partially-eaten apple next to him. The
codebreaking effort at Bletchley Park was kept secret for many years after the
war (Turings report on Enigma was not declassified until 1996), so Turing never
received public recognition for his contributions to the war effort. In September
2009, instigated by an on-line petition, British Prime Minister Gordon Brown
issued an apology for how the British government treated Alan Turing.
6.4 Summary
The power of computers comes from their programmability. Universal comput-
ers can be programmed to execute any algorithm. The Turing Machine model
provides a simple, abstract, model of a computing machine. Every algorithm
can be implemented as a Turing Machine, and a Turing Machine can simulate
any other reasonable computer.
As the first computer programmer, Ada deserves the last word:
By the word operation, we mean any process which alters the mutual re-
lation of two or more things, be this relation of what kind it may. This is
the most general definition, and would include all subjects in the universe.
In abstract mathematics, of course operations alter those particular rela-
tions which are involved in the considerations of number and space, and
the results of operations are those peculiar results which correspond to the
nature of the subjects of operation. But the science of operations, as de-
rived from mathematics more especially, is a science of itself, and has its
own abstract truth and value; just as logic has its own peculiar truth and
value, independently of the subjects to which we may apply its reasonings
and processes.. . .
The operating mechanism can even be thrown into action independently
of any object to operate upon (although of course no result could then be
developed). Again, it might act upon other things besides number, were
objects found whose mutual fundamental relations could be expressed by
those of the abstract science of operations, and which should be also sus-
ceptible of adaptations to the action of the operating notation and mech-
anism of the engine. Supposing, for instance, that the fundamental rela-
tions of pitched sounds in the science of harmony and of musical composi-
tion were susceptible of such expression and adaptations, the engine might
compose elaborate and scientific pieces of music of any degree of complex-
ity or extent.
Ada, Countess of Lovelace, Sketch of The Analytical Engine, 1843
124 6.4. Summary
7 Cost
A LISP programmer knows the value of everything, but the cost of nothing.
Alan Perlis
I told my dad that someday Id have a computer that I could write programs on. He said that would
cost as much as a house. I said, Well, then Im going to live in an apartment.
Steve Wozniak
This chapter develops tools for reasoning about the cost of evaluating a given
expression. Predicting the cost of executing a procedure has practical value (for
example, we can estimate how much computing power is needed to solve a par-
ticular problem or decide between two possible implementations), but also pro-
vides deep insights into the nature of procedures and problems.
The most commonly used cost metric is time. Other measures of cost include
the amount of memory needed and the amount of energy consumed. Indirectly,
these costs can often be translated into money: the rate of transactions a service
can support, or the price of the computer needed to solve a problem.
gc time
The time in milliseconds the interpreter spent on garbage collection to eval-
uate the expression. Garbage collection is used to reclaim memory that is
storing data that will never be used again.
For example, using the definitions from Chapter 5,
(time (solve-pegboard (board-remove-peg (make-board 5)
(make-position 1 1))))
prints: cpu time: 141797 real time: 152063 gc time: 765. The real time is 152 seconds,
meaning this evaluation took just over two and a half minutes. Of this time, the
evaluation was using the CPU for 142 seconds, and the garbage collector ran for
less than one second.
Here are two more examples:
> (time (car (list-append (intsto 1000) (intsto 100))))
cpu time: 531 real time: 531 gc time: 62
1
> (time (car (list-append (intsto 1000) (intsto 100))))
cpu time: 609 real time: 609 gc time: 0
1
The two expressions evaluated are identical, but the reported time varies. Even
on the same computer, the time needed to evaluate the same expression varies.
Many properties unrelated to our expression (such as where things happen to
be stored in memory) impact the actual time needed for any particular evalua-
tion. Hence, it is dangerous to draw conclusions about which procedure is faster
based on a few timings.
Another limitation of this way of measuring cost is it only works if we wait for the
evaluation to complete. If we try an evaluation and it has not finished after an
hour, say, we have no idea if the actual time to finish the evaluation is sixty-one
minutes or a quintillion years. We could wait another minute, but if it still hasnt
finished we dont know if the execution time is sixty-two minutes or a quintillion
years. The techniques we develop allow us to predict the time an evaluation
Theres no sense in needs without waiting for it to execute.
being precise when
you dont even know Finally, measuring the time of a particular application of a procedure does not
what youre talking provide much insight into how long it will take to apply the procedure to differ-
about.
John von Neumann ent inputs. We would like to understand how the evaluation time scales with the
size of the inputs so we can understand which inputs the procedure can sensibly
be applied to, and can choose the best procedure to use for different situations.
The next section introduces mathematical tools that are helpful for capturing
how cost scales with input size.
Exercise 7.1. Suppose you are defining a procedure that needs to append two
lists, one short list, short and one very long list, long , but the order of elements
in the resulting list does not matter. Is it better to use (list-append short long ) or
(list-append long short)? (A good answer will involve both experimental results
and an analytical explanation.)
Chapter 7. Cost 127
Filius Bonacci was an Italian monk and mathematician in the 12th century. He
published a book, Liber Abbaci, on how to calculate with decimal numbers that
introduced Hindu-Arabic numbers to Europe (replacing Roman numbers) along
with many of the algorithms for doing arithmetic we learn in elementary school.
It also included the problem for which Fibonacci numbers are named:2
A pair of newly-born male and female rabbits are put in a field. Rabbits
mate at the age of one month and after that procreate every month, so the
female rabbit produces a new pair of rabbits at the end of its second month.
Assume rabbits never die and that each female rabbit produces one new
pair (one male, one female) every month from her second month on. How
many pairs will there be in one year?
Filius Bonacci
We can define a function that gives the number of pairs of rabbits at the begin-
ning of the nth month as:
1 : n=1
Fibonacci (n) = 1 : n=2
Fibonacci (n 1) + Fibonacci (n 2) : n > 1
The third case follows from Bonaccis assumptions: all the rabbits alive at the
beginning of the previous month are still alive (the Fibonacci (n 1) term), and
all the rabbits that are at least two months old reproduce (the Fibonacci (n 2)
term).
The sequence produced is known as the Fibonacci sequence:
After the first two 1s, each number in the sequence is the sum of the previous
two numbers. Fibonacci numbers occur frequently in nature, such as the ar-
rangement of florets in the sunflower (34 spirals in one direction and 55 in the
other) or the number of petals in common plants (typically 1, 2, 3, 5, 8, 13, 21, or
34), hence the rarity of the four-leaf clover.
Translating the definition of the Fibonacci function into a Scheme procedure is
straightforward; we combine the two base cases using the or special form:
(define (fibo n)
(if (or (= n 1) (= n 2)) 1
(+ (fibo ( n 1)) (fibo ( n 2)))))
Applying fibo to small inputs works fine:
> (time (fibo 10))
cpu time: 0 real time: 0 gc time: 0
55
> (time (fibo 30))
cpu time: 2156 real time: 2187 gc time: 0
832040
2 Although the sequence is named for Bonacci, it was probably not invented by him. The se-
quence was already known to Indian mathematicians with whom Bonacci studied.
128 7.1. Empirical Measurements
But when we try to determine the number of rabbits in five years by computing
(fibo 60), our interpreter just hangs without producing a value.
The fibo procedure is defined in a way that guarantees it eventually completes
when applied to a non-negative whole number: each recursive call reduces the
input by 1 or 2, so both recursive calls get closer to the base case. Hence, we
always make progress and must eventually reach the base case, unwind the re-
cursive applications, and produce a value. To understand why the evaluation of
(fibo 60) did not finish in our interpreter, we need to consider how much work is
required to evaluate the expression.
To evaluate (fibo 60), the interpreter follows the if expressions to the recursive
case, where it needs to evaluate (+ (fibo 59) (fibo 58)). To evaluate (fibo 59), it
needs to evaluate (fibo 58) again and also evaluate (fibo 57). To evaluate (fibo 58)
(which needs to be done twice), it needs to evaluate (fibo 57) and (fibo 56). So,
there is one evaluation of (fibo 60), one evaluation of (fibo 59), two evaluations
of (fibo 58), and three evaluations of (fibo 57).
The total number of evaluations of the fibo procedure for each input is itself
the Fibonacci sequence! To understand why, consider the evaluation tree for
(fibo 4) shown in Figure 7.1. The only direct number values are the 1 values that
result from evaluations of either (fibo 1) or (fibo 2). Hence, the number of 1 val-
ues must be the value of the final result, which just sums all these numbers.
For (fibo 4), there are 5 leaf applications, and 3 more inner applications, for 8
(= Fibonacci (5)) total recursive applications. The number of evaluations of ap-
plications of fibo needed to evaluate (fibo 60) is the 61st Fibonacci number
2,504,730,781,961 over two and a half trillion applications of fibo!
(fibo 5)
(fibo 4) (fibo 3)
(fibo 2) (fibo 1) 1 1 1
1 1
Although our fibo definition is correct, it is ridiculously inefficient and only fin-
ishes for input numbers below about 40. It involves a tremendous amount of
duplicated work: for the (fibo 60) example, there are two evaluations of (fibo 58)
and over a trillion evaluations of (fibo 1) and (fibo 2).
We can avoid this duplicated effort by building up to the answer starting from
the base cases. This is more like the way a human would determine the numbers
in the Fibonacci sequence: we find the next number by adding the previous two
Chapter 7. Cost 129
3 Perhaps Bonaccis assumptions are not a good model for actual rabbit procreation. This result
suggests that in about 10 years the mass of all the rabbits produced from the initial pair will exceed
the mass of the Earth, which, although scary, seems unlikely!
130 7.2. Orders of Growth
the important properties of how resources required grow with input size. Each
function takes as input a function, and produces as output a set of functions:
O( f ) (big oh)
The set of functions that grow no faster than f grows.
( f ) (theta)
The set of functions that grow as fast as f grows.
( f ) (omega)
The set of functions that grow no slower than f grows.
These functions capture the asymptotic behavior of functions, that is, how they
behave as the inputs get arbitrarily large. To understand how the time required
to evaluate a procedure increases as the inputs to that procedure increase, we
need to know the asymptotic behavior of a function that takes the size of input
to the target procedure as its input and outputs the number of steps to evaluate
Remember that the target procedure on that input.
accumulated
knowledge, like Figure 7.2 depicts the sets O, , for some function f . Next, we define each
accumulated function and provide some examples. Section 7.3 illustrates how to analyze the
capital, increases at
compound interest:
time required to evaluate applications of procedures using these notations.
but it differs from
the accumulation of
capital in this; that
the increase of
knowledge produces (f) (f)
a more rapid rate of
Functions that
f)
progress, whilst the
O(
accumulation of grow faster
capital leads to a than f f
lower rate of
interest. Capital
thus checks its own Functions that
accumulation: grow slower than f
knowledge thus
accelerates its own
advance. Each
generation,
therefore, to deserve
comparison with its Functions that grow
predecessor, is as fast as f
bound to add much
more largely to the
common stock than Figure 7.2. Visualization of the sets O( f ), ( f ), and ( f ).
that which it
immediately
succeeds.
Charles Babbage, 1851
7.2.1 Big O
The first notation we introduce is O, pronounced big oh. The O function takes
as input a function, and produces as output the set of all functions that grow no
faster than the input function. The set O( f ) is the set of all functions that grow
as fast as, or slower than, f grows. In Figure 7.2, the O( f ) set is represented by
everything inside the outer circle.
To define the meaning of O precisely, we need to consider what it means for a
function to grow. We want to capture how the output of the function increases
as the input to the function increases. First, we consider a few examples; then
we provide a formal definition of O.
Chapter 7. Cost 131
80 r 6000
5000
r
60 4000 ?
r ? Fibo (n)
3000
40 r ? ` ?
r ` ` ` 3n 2000
20 r ` ` ? ?
`r ` ? 1000 ?
`?r ?`r ? ? ? ? r r r? r?`r?r` `r r` r` `r r` n2
0 0 ?`r?`r?r` ?r` ?`r?`r?`r?r` ?r` ?` ?` ?` `
2 4 6 8 10 4 8 12 16 20
n n
rightmost value of n2 is greatest; for higher input values, the value of Fibonacci (n)
is greatest. In the second graph, the values of Fibonacci (n) for input values up to
20 are so large that the other functions appear as nearly flat lines on the graph.
Definition of O. The function g is a member of the set O( f ) if and only if there
exist positive constants c and n0 such that, for all values n n0 ,
g ( n ) c f ( n ).
We can show g is in O( f ) using the definition of O( f ) by choosing positive con-
stants for the values of c and n0 , and showing that the property g(n) c f (n)
holds for all values n n0 . To show g is not in O( f ), we need to explain how, for
any choices of c and n0 , we can find values of n that are greater than n0 such that
g(n) c f (n) does not hold.
For all of the examples where g is in O( f ), there are many acceptable choices for
c and n0 . For the given c values, we can always use a higher n0 value than the
selected value. It only matters that there is some finite, positive constant we can
choose for n0 , such that the required inequality, g(n) c f (n) holds for all values
n n0 . Hence, our proofs work equally well with higher values for n0 than we
selected. Similarly, we could always choose higher c values with the same n0
values. The key is just to pick any appropriate values for c and n0 , and show the
inequality holds for all values n n0 .
Proving that a function is not in O( f ) is usually tougher. The key to these proofs
is that the value of n that invalidates the inequality is selected after the values of
c and n0 are chosen. One way to think of this is as a game between two adver-
saries. The first player picks c and n0 , and the second player picks n. To show the
property that g is not in O( f ), we need to show that no matter what values the
first player picks for c and n0 , the second player can always find a value n that is
greater than n0 such that g(n) > c f (n).
Exercise 7.2. For each of the g functions below, answer whether or not g is in the
set O(n). Your answer should include a proof. If g is in O(n) you should identify
values of c and n0 that can be selected to make the necessary inequality hold.
If g is not in O(n) you should argue convincingly that no matter what values
are chosen for c and n0 there are values of n n0 such the inequality in the
definition of O does not hold.
a. g(n) = n + 5
b. g(n) = .01n
c. g(n) = 150n + n
d. g(n) = n1.5
e. g(n) = n!
Exercise 7.3. [?] Given f is some function in O(h), and g is some function not in
O(h), which of the following must always be true:
a. For all positive integers m, f (m) g(m).
b. For some positive integer m, f (m) < g(m).
c. For some positive integer m0 , and all positive integers m > m0 ,
f ( m ) < g ( m ).
7.2.2 Omega
The set ( f ) (omega) is the set of functions that grow no slower than f grows.
So, a function g is in ( f ) if g grows as fast as f or faster. Constrast this with
O( f ), the set of all functions that grow no faster than f grows. In Figure 7.2,
( f ) is the set of all functions outside the darker circle.
The formal definition of ( f ) is nearly identical to the definition of O( f ): the
only difference is the comparison is changed to .
Definition of ( f ). The function g is a member of the set ( f ) if and only if
there exist positive constants c and n0 such that, for all values n n0 ,
g ( n ) c f ( n ).
134 7.2. Orders of Growth
n 7 is in (n + 12)
Choose c = 21 and n0 = 26. Then, we need to show n 7 21 (n + 12) for
all values n 26. This is true, since the inequality simplifies n2 13 which
holds for all values n 26.
2n is in (3n)
Choose c = 13 and n0 = 1. Then, 2n 13 (3n) simplifies to n 0 which holds
for all values n 1.
n is not in (n2 )
Whatever values are chosen for c and n0 , we can choose n n0 such that
n cn2 does not hold. Choose n > 1c (note that c must be less than 1 for
the inequality to hold for any positive n, so if c is not less than 1 we can just
choose n 2). Then, the right side of the inequality cn2 will be greater than
n, and the needed inequality n cn2 does not hold.
n is not in (Fibonacci (n))
No matter what values are chosen for c and n0 , we can choose n n0 such
that n Fibonacci (n) does not hold. The value of Fibonacci (n) more than
doubles every time n is increased by 2 (see Section 7.2.1), but the value
of c(n) only increases by 2c. Hence, if we keep increasing n, eventually
Fibonacci (n + 1) > c(n 2) for any choice of c.
Exercise 7.5. For each part, identify a function g that satisfies the property.
a. g is in O(n2 ) but not in (n2 ).
b. g is not in O(n2 ) but is in (n2 ).
c. g is in both O(n2 ) and (n2 ).
7.2.3 Theta
The function ( f ) denotes the set of functions that grow at the same rate as f .
It is the intersection of the sets O( f ) and ( f ). Hence, a function g is in ( f ) if
and only if g is in O( f ) and g is in ( f ). In Figure 7.2, ( f ) is the ring between
the outer and inner circles.
An alternate definition combines the inequalities for O and :
Definition of ( f ). The function g is a member of the set ( f ) if any only if
there exist positive constants c1 , c2 , and n0 such that, for all values n n0 ,
c1 f ( n ) g ( n ) c2 f ( n ).
Chapter 7. Cost 135
If g(n) is in ( f (n)), then the sets ( f (n)) and ( g(n)) are identical. If g(n)
( f (n)) then g and f grow at the same rate,
alphabet does not matter for our analysis since we are concerned with orders of
growth not absolute values. Within the O, , and operators, a constant fac-
tor does not matter (e.g., (n) (17n + 523)). This means is doesnt matter
whether we use an alphabet with two symbols or an alphabet with 256 symbols.
With two symbols the input may be 8 times as long as it is with a 256-symbol al-
phabet, but the constant factor does not matter inside the asymptotic operator.
Thus, we measure the size of the input as the number of symbols required to
write the number on a Turing Machine input tape. To figure out the input size
of a given type, we need to think about how many symbols it would require to
write down inputs of that type.
Booleans. There are only two Boolean values: true and false. Hence, the length
of a Boolean input is fixed.
Numbers. Using the decimal number system (that is, 10 tape symbols), we can
write a number of magnitude n using log10 n digits. Using the binary number
system (that is, 2 tape symbols), we can write it using log2 n bits. Within the
asymptotic operators, the base of the logarithm does not matter (as long as it is
a constant) since it changes the result by a constant factor. We can see this from
the argument above changing the number of symbols in the input alphabet
changes the input length by a constant factor which has no impact within the
asymptotic operators.
Lists. If the input is a List, the size of the input is related to the number of
elements in the list. If each element is a constant size (for example, a list of
numbers where each number is between 0 and 100), the size of the input list is
some constant multiple of the number of elements in the list. Hence, the size of
an input that is a list of n elements is cn for some constant c. Since (cn) = (n),
the size of a List input is (n) where n is the number of elements in the List. If
List elements can vary in size, then we need to account for that in the input size.
For example, suppose the input is a List of Lists, where there are n elements in
each inner List, and there are n List elements in the main List. Then, there are n2
total elements and the input size is in (n2 ).
it can read the symbol in the current square, write a symbol into that square,
transition its internal state number, and move one square to the left or right.
Counting Turing Machine steps is very precise, but difficult because we do not
usually start with a Turing Machine description of a procedure and creating one
Time makes more is tedious.
converts than
reason. Instead, we usually reason directly from a Scheme procedure (or any precise de-
Thomas Paine
scription of a procedure) using larger steps. As long as we can claim that what-
ever we consider a step could be simulated using a constant number of steps
on a Turing Machine, our larger steps will produce the same answer within the
asymptotic operators. One possibility is to count the number of times an evalua-
tion rule is used in an evaluation of an application of the procedure. The amount
of work in each evaluation rule may vary slightly (for example, the evaluation
rule for an if expression seems more complex than the rule for a primitive) but
does not depend on the input size.
Hence, it is reasonable to assume all the evaluation rules to take constant time.
This does not include any additional evaluation rules that are needed to apply
one rule. For example, the evaluation rule for application expressions includes
evaluating every subexpression. Evaluating an application constitutes one work
unit for the application rule itself, plus all the work required to evaluate the
subexpressions. In cases where the bigger steps are unclear, we can always re-
turn to our precise definition of a step as one step of a Turing Machine.
we know how the main procedure uses the helper procedure, we can more pre-
cisely estimate the actual running time by considering the actual inputs. We see
an example of this in the analysis of how the + procedure is used by list-length
in Section 7.4.2.
where Proc is the name of the procedure we are analyzing. Because the output
represents the maximum number of steps required, we need to consider the
worst-case input of the given size.
Because of all the issues with counting steps exactly, and the uncertainty about
how much work can be done in one step on a particular machine, we cannot
usually determine the exact function for Max-Steps Proc . Instead, we charac-
terize the running time of a procedure with a set of functions denoted by an
asymptotic operator. Inside the O, , and operators, the actual time needed
for each step does not matter since the constant factors are hidden by the oper-
ator; what matters is how the number of steps required grows as the size of the
input grows.
Hence, we will characterize the running time of a procedure using a set of func-
tions produced by one of the asymptotic operators. The operator provides the
most information. Since ( f ) is the intersection of O( f ) (no faster than) and
( f ) (no slower than), knowing that the running time of a procedure is in ( f )
for some function f provides much more information than just knowing it is in
O( f ) or just knowing that it is in ( f ). Hence, our goal is to characterize the
running time of a procedure using the set of functions defined by ( f ) of some
function f .
The rest of this section provides examples of procedures with different growth
rates, from slowest (no growth) through increasingly rapid growth rates. The
growth classes described are important classes that are commonly encountered
when analyzing procedures, but these are only examples of growth classes. Be-
tween each pair of classes described here, there are an unlimited number of dif-
ferent growth classes.
We cannot do much in constant time, since we cannot even examine the whole
input. A constant time procedure must be able to produce its output by exam-
ining only a fixed-size part of the input. Recall that the input size measures the
number of squares needed to represent the input. No matter how long the input
is, a constant time procedure can look at no more than some fixed number of
squares on the tape, so cannot even read the whole input.
An example of a constant time procedure is the built-in procedure car. When
car is applied to a non-empty list, it evaluates to the first element of that list.
No matter how long the input list is, all the car procedure needs to do is extract
the first component of the list. So, the running time of car is in O(1).4 Other
built-in procedures that involve lists and pairs that have running times in O(1)
include cons, cdr, null?, and pair?. None of these procedures need to examine
more than the first pair of the list.
4 Since we are speculating based on what car does, not examining how car a particular Scheme
interpreter actually implements it, we cannot say definitively that its running time is in O(1). It
would be rather shocking, however, for an implementation to implement car in a way such that
its running time that is not in O(1). The implementation of scar in Section 5.2.1 is constant time:
regardless of the input size, evaluating an application of it involves evaluating a single application
expression, and then evaluating an if expression.
Chapter 7. Cost 141
where C is some constant that reflects the time for all the operations besides the
recursive call. Note that the value of nq does not matter, so we simplify this to:
This does not yet provide a useful characterization of the running time of list-
append though, since it is a circular definition. To make it a recursive definition,
we need a base case. The base case for the running time definition is the same
as the base case for the procedure: when the input is null. For the base case, the
running time is constant:
This procedure makes one recursive application of list-length for each element
in the input p. If the input has n elements, there will be n + 1 total applications
of list-length to evaluate (one for each element, and one for the null). So, the
total work is in (n work for each recursive application).
To determine the running time, we need to determine how much work is in-
volved in each application. Evaluating an application of list-length involves eval-
uating its body, which is an if expression. To evaluate the if expression, the pred-
icate expression, (null? p), must be evaluated first. This requires constant time
since the null? procedure has constant running time (see Section 7.4.1). The
consequent expression is the primitive expression, 0, which can be evaluated in
constant time. The alternate expression, (+ 1 (list-length (cdr p))), includes the
recursive call. There are n + 1 total applications of list-length to evaluate, the
total running time is n + 1 times the work required for each application (other
than the recursive application itself ).
The remaining work is evaluating (cdr p) and evaluating the + application. The
cdr procedure is constant time. Analyzing the running time of the + procedure
application is more complicated.
Cost of Addition. Since + is a built-in procedure, we need to think about how
it might be implemented. Following the elementary school addition algorithm
(from Section 6.2.3), we know we can add any two numbers by walking down
the digits. The work required for each digit is constant; we just need to compute
the corresponding result and carry bits using a simple formula or lookup table.
The number of digits to add is the maximum number of digits in the two input
numbers. Thus, if there are b digits to add, the total work is in (b). In the worst
case, we need to look at all the digits in both numbers. In general, we cannot
do asymptotically better than this, since adding two arbitrary numbers might
require looking at all the digits in both numbers.
But, in the list-length procedure the + is used in a very limited way: one of the
inputs is always 1. We might be able to add 1 to a number without looking at all
the digits in the number. Recall the addition algorithm: we start at the rightmost
(least significant) digit, add that digit, and continue with the carry. If one of
the input numbers is 1, then once the carry is zero we know now of the more
significant digits will need to change. In the worst case, adding one requires
changing every digit in the other input. For example, (+ 99999 1) is 100000. In
the best case (when the last digit is below 9), adding one requires only examining
and changing one digit.
Figuring out the average case is more difficult, but necessary to get a good esti-
mate of the running time of list-length. We assume the numbers are represented
in binary, so instead of decimal digits we are counting bits (this is both simpler,
and closer to how numbers are actually represented in the computer). Approx-
imately half the time, the least significant bit is a 0, so we only need to examine
one bit. When the last bit is not a 0, we need to examine the second least signifi-
cant bit (the second bit from the right): if it is a 0 we are done; if it is a 1, we need
to continue.
We always need to examine one bit, the least significant bit. Half the time we
also need to examine the second least significant bit. Of those times, half the
time we need to continue and examine the next least significant bit. This con-
Chapter 7. Cost 143
tinues through the whole number. Thus, the expected number of bits we need
to examine is,
1 1 1 1
1+ 1+ 1+ 1 + (1 + . . . )
2 2 2 2
where the number of terms is the number of bits in the input number, b. Simpli-
fying the equation, we get:
1 1 1 1 1
1+ + + + +...+ b
2 4 8 16 2
No matter how large b gets, this value is always less than 2. So, on average, the
number of bits to examine to add 1 is constant: it does not depend on the length
of the input number.
This result generalizes to addition where one of the inputs is any constant value.
Adding any constant C to a number n is equivalent to adding one C times. Since
adding one is a constant time procedure, adding one C times can also be done
in constant time for any constant C.
Excluding the recursive application, the list-length application involves appli-
cations of two constant time procedures: cdr and adding one using +. Hence,
the total time needed to evaluate one application of list-length, excluding the
recursive application, is constant.
There are n + 1 total applications of list-length to evaluate total, so the total run-
ning time is c(n + 1) where c is the amount of time needed for each application.
The set (c(n + 1)) is identical to the set (n), so the running time for the length
procedure is in (n) where n is the length of the input list.
If the predicate is true, the base case applies the car procedure, which has con-
stant running time. The alternate expression involves the recursive calls, as well
as evaluating (cdr p), which requires constant time, and ( n 1). The proce-
dure is similar to +: for arbitrary inputs, its worst case running time is linear
in the input size, but when one of the inputs is a constant the running time is
constant. This follows from a similar argument to the one we used for the +
procedure (Exercise 7.13 asks for a more detailed analysis of the running time of
subtraction). So, the work required for each recursive call is constant.
The number of recursive calls is determined by the value of n and the number
of elements in the list p. In the best case, when n is 1, there are no recursive calls
and the running time is constant since the procedure only needs to examine
the first element. Each recursive call reduces the value passed in as n by 1, so
the number of recursive calls scales linearly with n (the actual number is n 1
since the base case is when n equals 1). But, there is a limit on the value of n for
which this is true. If the value passed in as n exceeds the number of elements in
p, the procedure will produce an error when it attempts to evaluate (cdr p) for
the empty list. This happens after s p recursive calls, where s p is the number of
elements in p. Hence, the running time of list-get-element does not grow with
the length of the input passed as n; after the value of n exceeds the number of
elements in p it does not matter how much bigger it gets, the running time does
not continue to increase.
Thus, the worst case running time of list-get-element grows linearly with the
length of the input list. Equivalently, the running time of list-get-element is in
(s p ) where s p is the number of elements in the input list.
Exercise 7.10. Explain why the list-map procedure from Section 5.4.1 has run-
ning time that is linear in the size of its List input. Assume the procedure input
has constant running time.
Exercise 7.12. For the decimal six-digit odometer (shown in the picture on
page 142), we measure the amount of work to add one as the total number of
wheel digit turns required. For example, going from 000000 to 000001 requires
one work unit, but going from 000099 to 000100 requires three work units.
a. What are the worst case inputs?
b. What are the best case inputs?
c. [?] On average, how many work units are required for each mile? Assume
over the lifetime of the odometer, the car travels 1,000,000 miles.
d. Lever voting machines were used by the majority of American voters in the
1960s, although they are not widely used today. Most level machines used a
three-digit odometer to tally votes. Explain why candidates ended up with 99
votes on a machine far more often than 98 or 100 on these machines.
Chapter 7. Cost 145
Exercise 7.14. [?] Our analysis of the work required to add one to a number ar-
gued that it could be done in constant time. Test experimentally if the DrRacket
+ procedure actually satisfies this property. Note that one + application is too
quick to measure well using the time procedure, so you will need to design a
procedure that applies + many times without doing much other work.
(n 1) + (n 2) + . . . + 1 + 0.
n ( n ) = ( n2 ).
Thus, the running time is quadratic in the size of the input list.
an1 b0 a1 b0 a0 b0
an1 b1 a1 b1 a0 b1
+ a n 1 bn 1 a 1 bn 1 a 0 bn 1
r2n1 r2n2 r3 r2 r1 r0
If both input numbers have n digits, there are n2 digit multiplications, each of
which can be done in constant time. The intermediate results will be n rows,
each containing n digits. So, the total number of digits to add is n2 : 1 digit in the
ones place, 2 digits in the tens place, . . ., n digits in the 10n1 s place, . . ., 2 digits
in the 102n3 s place, and 1 digit in the 102n2 s place. Each digit addition requires
constant work, so the total work for all the digit additions is in (n2 ). Adding
the work for both the digit multiplications and the digit additions, the total run-
ning time for the elementary school multiplication algorithm is quadratic in the
number of input digits, (n2 ) where n is the number if digits in the inputs.
This is not the fastest known algorithm for multiplying two numbers, although it
was the best algorithm known until 1960. In 1960, Anatolii Karatsuba discovers
a multiplication algorithm with running time in (nlog2 3 ). Since log2 3 < 1.585
this is an improvement over the (n2 ) elementary school algorithm. In 2007,
Martin Furer discovered an even faster algorithm for multiplication.5 It is not
yet known if this is the fastest possible multiplication algorithm, or if faster ones
exist.
Exercise 7.15. [?] Analyze the running time of the elementary school long divi-
sion algorithm.
Exercise 7.16. [?] Define a Scheme procedure that multiplies two multi-digit
numbers (without using the built-in procedure except to multiply single-digit
numbers). Strive for your procedure to have running time in (n) where n is the
total number of digits in the input numbers.
5 Martin Furer, Faster Integer Multiplication, ACM Symposium on Theory of Computing, 2007.
Chapter 7. Cost 147
of the input.
148 7.4. Growth Rates
exponential time. The security of the widely used RSA encryption algorithm de-
pends on factoring being hard. If someone finds a fast factoring algorithm it
would put the codes used to secure Internet commerce at risk.7
The number of elements in the power set of S is 2|S| (where |S| is the number of
elements in the set S).
Here is a procedure that takes a list as input, and produces as output the power
set of the elements of the list:
(define (list-powerset s)
(if (null? s) (list null)
(list-append (list-map (lambda (t) (cons (car s) t))
(list-powerset (cdr s)))
(list-powerset (cdr s)))))
The list-powerset procedure produces a List of Lists. Hence, for the base case,
instead of just producing null, it produces a list containing a single element,
null. In the recursive case, we can produce the power set by appending the list
of all the subsets that include the first element, with the list of all the subsets that
do not include the first element. For example, the powerset of {1, 2, 3} is found
by finding the powerset of {2, 3}, which is {{}, {2}, {3}, {2, 3}}, and taking the
union of that set with the set of all elements in that set unioned with {1}.
An application of list-powerset involves applying list-append, and two recursive
applications of (list-powerset (cdr s)). Increasing the size of the input list by one,
doubles the total number of applications of list-powerset since we need to eval-
uate (list-powerset (cdr s)) twice. The number of applications of list-powerset is
2n where n is the length of the input list.8
The body of list-powerset is an if expression. The predicate applies the constant-
time procedure, null?. The consequent expression, (list null) is also constant
time. The alternate expression is an application of list-append. From Exam-
ple 7.4, we know the running time of list-append is (n p ) where n p is the num-
ber of elements in its first input. The first input is the result of applying list-map
to a procedure and the List produced by (list-powerset (cdr s)). The length of
the list output by list-map is the same as the length of its input, so we need to
determine the length of (list-powerset (cdr s)).
We use ns to represent the number of elements in s. The length of the input list
to map is the number of elements in the power set of a size ns 1 set: 2ns 1 . But,
for each application, the value of ns is different. Since we are trying to determine
the total running time, we can do this by thinking about the total length of all the
input lists to list-map over all of the list-powerset. In the input is a list of length
n, the total list length is 2n1 + 2n2 + ... + 21 + 20 , which is equal to 2n 1. So,
7 The movie Sneakers is a fictional account of what would happen if someone finds a faster than
we could do it once and reuse the result. Even with this change, though, the running time would still
be in (2n ).
Chapter 7. Cost 149
7.5 Summary
Because the speed of computers varies and the exact time required for a particu-
lar application depends on many details, the most important property to under-
stand is how the work required scales with the size of the input. The asymptotic
operators provide a convenient way of understanding the cost involved in eval-
uating a procedure applications.
Procedures that can produce an output only touching a fixed amount have con-
stant running times. Procedures whose running times increase by a fixed amount
150 7.5. Summary
when the input size increases by one have linear (in (n)) running times. Proce-
dures whose running time quadruples when the input size doubles have quadratic
(in (n2 )) running times. Procedures whose running time doubles when the in-
put size increases by one have exponential (in (2n )) running times. Procedures
with exponential running time can only be evaluated for small inputs.
Asymptotic analysis, however, must be interpreted cautiously. For large enough
inputs, a procedure with running time in (n) is always faster than a procedure
with running time in (n2 ). But, for an input of a particular size, the (n2 )
procedure may be faster. Without knowing the constants that are hidden by the
asymptotic operators, there is no way to accurately predict the actual running
time on a given input.
Exercise 7.18. Analyze the asymptotic running time of the list-sum procedure
(from Example 5.2):
(define (list-sum p)
(if (null? p)
0
(+ (car p) (list-sum (cdr p)))))
You may assume all of the elements in the list have values below some constant
(but explain why this assumption is useful in your analysis).
Exercise 7.19. Analyze the asymptotic running time of the factorial procedure
(from Example 4.1):
(define (factorial n) (if (= n 0) 1 ( n (factorial ( n 1)))))
Be careful to describe the running time in terms of the size (not the magnitude)
of the input.
Exercise 7.22. Analyze the running time of the deep-list-flatten procedure from
Section 5.5:
(define (deep-list-flatten p)
(if (null? p) null
(list-append (if (list? (car p))
(deep-list-flatten (car p))
(list (car p)))
(deep-list-flatten (cdr p)))))
Exercise 7.23. [?] Find and correct at least one error in the Orders of Growth
section of the Wikipedia page on Analysis of Algorithms (http://en.wikipedia.org/
wiki/Analysis of algorithms). This is rated as [?] now (July 2011), since the cur-
rent entry contains many fairly obvious errors. Hopefully it will soon become a
[? ? ?] challenge, and perhaps, eventually will become impossible!
152 7.5. Summary
Sorting and Searching
8
If you keep proving stuff that others have done, getting confidence, increasing the
complexities of your solutionsfor the fun of itthen one day youll turn
around and discover that nobody actually did that one!
And thats the way to become a computer scientist.
Richard Feynman, Lectures on Computation
This chapter presents two extended examples that use the programming tech-
niques from Chapters 25 and analysis ideas from Chapters 67 to solve some
interesting and important problems: sorting and searching. These examples in-
volve some quite challenging problems and incorporate many of the ideas we
have seen up to this point in the book. Once you understand them, you are well
on your way to thinking like a computer scientist!
8.1 Sorting
The sorting problem takes two inputs: a list of elements and a comparison pro-
cedure. It outputs a list containing same elements as the input list ordered ac-
cording to the comparison procedure. For example, if we sort a list of numbers
using < as the comparison procedure, the output is the list of numbers sorted
in order from least to greatest.
Sorting is one of the most widely studied problems in computing, and many
different sorting algorithms have been proposed. Try to develop a sorting pro-
cedure yourself before continuing further. It may be illuminating to try sorting
some items by hand an think carefully about how you do it and how much work
it is. For example, take a shuffled deck of cards and arrange them in sorted or-
der by ranks. Or, try arranging all the students in your class in order by birthday.
Next, we present and analyze three different sorting procedures.
a < c for all numbers a, b, and c. If the comparison function does not have this
property, there may be no way to arrange the elements in a single sorted list. All
of our sorting procedures require that the procedure passed as the comparison
function is transitive.
Once we can find the best element in a given list, we can sort the whole list by
repeatedly finding the best element of the remaining elements until no more
elements remain. To define our best-first sorting procedure, we first define a
procedure for finding the best element in the list, and then define a procedure
for removing an element from a list.
Finding the Best. The best element in the list is either the first element, or the
best element from the rest of the list. Hence, we define list-find-best recursively.
An empty list has no best element, so the base case is for a list that has one
element. When the input list has only one element, that element is the best
element. If the list has more than one element, the best element is the better of
the first element in the list and the best element of the rest of the list.
To pick the better element from two elements, we define the pick-better proce-
dure that takes three inputs: a comparison function and two values.
(define (pick-better cf p1 p2) (if (cf p1 p2) p1 p2))
Assuming the procedure passed as cf has constant running time, the running
time of pick-better is constant. For most of our examples, we use the < proce-
dure as the comparison function. For arbitrary inputs, the running time of < is
not constant since in the worst case performing the comparison requires exam-
ining every digit in the input numbers. But, if the maximum value of a number
in the input list is limited, then we can consider < a constant time procedure
since all of the inputs passed to it are below some fixed size.
We use pick-better to define list-find-best:
(define (list-find-best cf p)
(if (null? (cdr p)) (car p)
(pick-better cf (car p) (list-find-best cf (cdr p)))))
We use n to represent the number of elements in the input list p. An applica-
tion of list-find-best involves n 1 recursive applications since each one passes
in (cdr p) as the new p operand and the base case stops when the list has one
element left. The running time for each application (excluding the recursive
application) is constant since it involves only applications of the constant time
procedures null?, cdr, and pick-better. So, the total running time for list-find-
best is in (n); it scales linearly with the length of the input list.
Deleting an Element. To implement best first sorting, we need to produce a
list that contains all the elements of the original list except for the best element,
which will be placed at the front of the output list. We define a procedure, list-
delete, that takes as inputs a List and a Value, and produces a List that contains
all the elements of the input list in the original order except for the first element
that is equal to the input value.
Chapter 8. Sorting and Searching 155
Expression :: LetExpression
LetExpression :: (let (Bindings) Expression)
Bindings :: Binding Bindings
Bindings :: e
Binding :: (Name Expression)
(define (list-sort-best-first-let cf p)
(if (null? p) null
(let ((best (list-find-best cf p)))
(cons best (list-sort-best-first-let cf (list-delete p best))))))
This runs faster than list-sort-best-first since it avoids the duplicate evaluations,
but the asymptotic asymptotic running time is still in (n2 ): there are n recur-
sive applications of list-sort-best-first-let and each application involves linear
time applications of list-find-best and list-delete. Using the let expression im-
proves the actual running time by avoiding the duplicate work, but does not im-
pact the asymptotic growth rate since the duplicate work is hidden in the con-
stant factor.
Exercise 8.1. What is the best case input for list-sort-best-first? What is its
asymptotic running time on the best case input?
Exercise 8.2. Use the time special form (Section 7.1) to experimentally mea-
sure the evaluation times for the list-sort-best-first-let procedure. Do the results
match the expected running times based on the (n2 ) asymptotic running time?
You may find it helpful to define a procedure that constructs a list containing
n random elements. To generate the random elements use the built-in proce-
dure random that takes one number as input and evaluates to a pseudorandom
number between 0 and one less than the value of the input number. Be careful
in your time measurements that you do not include the time required to gener-
ate the input list.
Exercise 8.3. Define the list-find-best procedure using the list-accumulate pro-
cedure from Section 5.4.2 and evaluate its asymptotic running time.
Exercise 8.4. [?] Define and analyze a list-sort-worst-last procedure that sorts
by finding the worst element first and putting it at the end of the list.
Exercise 8.5. We analyzed the worst case running time of list-sort-insert above.
Analyze the best case running time. Your analysis should identify the inputs for
which list-sort-insert runs fastest, and describe the asymptotic running time for
the best case input.
The problem with our insertion sort is that it divides the work unevenly into
inserting one element and sorting the rest of the list. This is a very unequal
division. Any sorting procedure that works by considering one element at a time
and putting it in the sorted position as is done by list-sort-find-best and list-
sort-insert has a running time in (n2 ). We cannot do better than this with this
strategy since there are n elements, and the time required to figure out where
each element goes is in (n).
To do better, we need to either reduce the number of recursive applications
needed (this would mean each recursive call results in more than one element
being sorted), or reduce the time required for each application. The approach
we take is to use each recursive application to divide the list into two approx-
imately equal-sized parts, but to do the division in such a way that the results
of sorting the two parts can be combined directly to form the result. We par-
tition the elements in the list so that all elements in the first part are less than
(according to the comparison function) all elements in the second part.
Our first attempt is to modify insert-one to partition the list into two parts. This
approach does not produce a better-than-quadratic time sorting procedure be-
cause of the inefficiency of accessing list elements; however, it leads to insights
for producing a quicker sorting procedure.
First, we define a list-extract procedure that takes as inputs a list and two num-
bers indicating the start and end positions, and outputs a list containing the
elements of the input list between the start and end positions:
(define (list-extract p start end)
(if (= start 0)
(if (= end 0) null
(cons (car p) (list-extract (cdr p) start ( end 1))))
(list-extract (cdr p) ( start 1) ( end 1))))
The running time of the list-extract procedure is in (n) where n is the number
of elements in the input list. The worst case input is when the value of end is the
length of the input list, which means there will be n recursive applications, each
involving a constant amount of work.
We use list-extract to define procedures for obtaining first and second halves of
a list (when the list has an odd number of elements, we put the middle element
in the second half of the list):
(define (list-first-half p)
(list-extract p 0 (floor (/ (list-length p) 2))))
(define (list-second-half p)
(list-extract p (floor (/ (list-length p) 2)) (list-length p)))
The list-first-half and list-second-half procedures use list-extract so their run-
ning times are linear in the number of elements in the input list.
The list-insert-one-split procedure inserts an element in sorted order by first
splitting the list in halves and then recursively inserting the new element in the
appropriate half of the list:
160 8.1. Sorting
half of the list. In either case, the length of the list is approximately n2 . The run-
ning time of list-append is in (m) where m is the length of the first input list.
So, the time required for each list-insert-one-split application is in (n) where
n is the length of the input list to list-insert-one-split.
The lengths of the input lists to list-insert-one-split in the recursive calls are ap-
proximately n2 , n4 , n8 , . . ., 1, since the length of the list halves with each call. The
summation has log2 n terms, and the sum of the list is n, so the average length
input is logn n . Hence, the total running time for the list-append applications in
2
each application of list-insert-one-split is in (log2 n logn n ) = (n).
2
A List is either (1) null or (2) a Pair whose second cell is a List.
A Tree is either (1) null or (2) a triple while first and third parts are both
Trees.
Symbolically:
Tree :: null
Tree :: (make-tree Tree Element Tree)
The make-tree procedure can be defined using cons to package the three inputs
into a tree:
(define (make-tree left element right)
(cons element (cons left right)))
We define selector procedures for extracting the parts of a non-null tree:
(define (tree-element tree) (car tree))
(define (tree-left tree) (car (cdr tree)))
(define (tree-right tree) (cdr (cdr tree)))
The tree-left and tree-right procedures are constant time procedures that evalu-
ate to the left or right subtrees respectively of a tree.
In a sorted tree, the elements are maintained in a sorted structure. All elements
in the left subtree of a tree are less than (according to the comparison function)
the value of the root element of the tree; all elements in the right subtree of a tree
are greater than or equal to the value of the root element of the tree (the result
of comparing them with the root element is false). For example, here is a sorted
binary tree containing 6 elements using < as the comparison function:
5 12
1 6 17
The top node has element value 7, and its left subtree is a tree containing the
tree elements whose values are less than 7. The null subtrees are not shown. For
example, the left subtree of the element whose value is 12 is null. Although there
are six elements in the tree, we can reach any element from the top by following
at most two branches. By contrast, with a list of six elements, we need five cdr
operations to reach the last element.
depth The depth of a tree is the largest number of steps needed to reach any node in
Chapter 8. Sorting and Searching 163
the tree starting from the root. The example tree has depth 2, since we can reach
every node starting from the root of the tree in two or fewer steps. A tree of depth
d can contain up to 2d+1 1 elements. One way to see this is from this recursive
definition for the maximum number of nodes in a tree:
1 : d=0
TreeNodes (d) =
TreeNodes (d 1) + 2 TreeLeaves (d 1) : d > 0
A tree of depth zero has one node. Increasing the depth of a tree by one means
we can add two nodes for each leaf node in the tree, so the total number of nodes
in the new tree is the sum of the number of nodes in the original tree and twice
the number of leaves in the original tree. The maximum number of leaves in a
tree of depth d is 2d since each level doubles the number of leaves. Hence, the
second equation simplifies to
TreeNodes (d 1) + 2 2d1 = TreeNodes (d 1) + 2d .
The value of TreeNodes (d 1) is 2d1 + 2d2 + . . . + 1 = 2d 1. Adding 2d and
2d 1 gives 2d+1 1 as the maximum number of nodes in a tree of depth d.
Hence, a well-balanced tree containing n nodes has depth approximately log2 n.
A tree is well-balanced if the left and right subtrees of all nodes in the contain well-balanced
nearly the same number of elements.
Procedures that are analogous to the list-first-half , list-second-half , and list-
append procedures that had linear running times for the standard list represen-
tation can all be implemented with constant running times for the tree repre-
sentation. For example, tree-left is analogous to list-first-half and make-tree is
analogous to list-append.
The tree-insert-one procedure inserts an element in a sorted binary tree:
(define (tree-insert-one cf el tree)
(if (null? tree) (make-tree null el null)
(if (cf el (tree-element tree))
(make-tree (tree-insert-one cf el (tree-left tree))
(tree-element tree)
(tree-right tree))
(make-tree (tree-left tree)
(tree-element tree)
(tree-insert-one cf el (tree-right tree))))))
When the input tree is null, the new element is the top element of a new tree
whose left and right subtrees are null. Otherwise, the procedure compares the
element to insert with the element at the top node of the tree. If the comparison
evaluates to true, the new element belongs in the left subtree. The result is a tree
where the left tree is the result of inserting this element in the old left subtree,
and the element and right subtree are the same as they were in the original tree.
For the alternate case, the element is inserted in the right subtree, and the left
subtree is unchanged.
In addition to the recursive call, tree-insert-one only applies constant time pro-
cedures. If the tree is well-balanced, each recursive application halves the size
of the input tree so there are approximately log2 n recursive calls. Hence, the
running time to insert an element in a well-balanced tree using tree-insert-one
is in (log n).
164 8.1. Sorting
ments in its input list (which is the result of the list-to-sorted tree application, a
list containing n elements where n is the number of elements in p).
Only the fastest-growing term contributes to the total asymptotic running time,
so the expected total running time for an application of list-sort-tree-insert to a
list containing n elements is in (n log n). This is substantially better than the
previous sorting algorithms which had running times in (n2 ) since logarithms
grow far slower than their input. For example, if n is one million, n2 is over 50,000
times bigger than n log2 n; if n is one billion, n2 is over 33 million times bigger
than n log2 n since log2 1000000000 is just under 30.
There is no general sorting procedure that has expected running time better
than (n log n), so there is no algorithm that is asymptotically faster than list-
sort-tree (in fact, it can be proven that no asymptotically faster sorting procedure
exists). There are, however, sorting procedures that may have advantages such
as how they use memory which may provide better absolute performance in
some situations.
Unbalanced Trees. Our analysis assumes the left and right halves of the tree
passed to tree-insert-one having approximately the same number of elements.
If the input list is in random order, this assumption is likely to be valid: each
element we insert is equally likely to go into the left or right half, so the halves
contain approximately the same number of elements all the way down the tree.
But, if the input list is not in random order this may not be the case.
For example, suppose the input list is already in sorted order. Then, each ele-
ment that is inserted will be the rightmost node in the tree when it is inserted.
For the previous example, this produces the unbalanced tree shown in Figure 8.1.
This tree contains the same six elements as the earlier example, but because it is
not well-balanced the number of branches that must be traversed to reach the
deepest element is 5 instead of 2. Similarly, if the input list is in reverse sorted
order, we will have an unbalanced tree where only the left branches are used.
In these pathological situations, the tree effectively becomes a list. The num-
1
12
17
Exercise 8.8. [?] Define a procedure binary-tree-depth that takes as input a bi-
nary tree and outputs the depth of the tree. The running time of your procedure
should not grow faster than linearly with the number of nodes in the tree.
could be done more quickly if the words to translate were sorted alphabetically.
Hoare invented the quicksort algorithm for this purpose and it remains the most
widely used sorting algorithm.
As with list-sort-tree-insert, the expected running time for a randomly arranged
list is in (n log n) and the worst case running time is in (n2 ). In the expected
cases, each recursive call halves the size of the input list (since if the list is ran-
domly arranged we expect about half of the list elements are below the value of
the first element), so there are approximately log n expected recursive calls.
Each call involves an application of list-filter, which has running time in (m)
where m is the length of the input list. At each call depth, the total length of the
inputs to all the calls to list-filter is n since the original list is subdivided into 2d
sublists, which together include all of the elements in the original list. Hence, the
total running time is in (n log n) in the expected cases where the input list is
randomly arranged. As with list-sort-tree-insert, if the input list is not randomly
rearranged it is possible that all elements end up in the same partition. Hence,
the worst case running time of list-quicksort is still in (n2 ). There are two ways
of constructing a
software design: one
way is to make it so
Exercise 8.10. Estimate the time it would take to sort a list of one million ele- simple that there
ments using list-quicksort. are obviously no
deficiencies, and the
other way is to
Exercise 8.11. Both the list-quicksort and list-sort-tree-insert procedures have make it so
expected running times in (n log n). Experimentally compare their actual run- complicated that
ning times. there are no obvious
deficiencies. The
first method is far
Exercise 8.12. What is the best case input for list-quicksort? Analyze the asymp- more difficult. It
demands the same
totic running time for list-quicksort on best case inputs. skill, devotion,
insight, and even
inspiration as the
Exercise 8.13. [?] Instead of using binary trees, we could use ternary trees. discovery of the
A node in a ternary tree has two elements, a left element and a right element, simple physical
where the left element must be before the right element according to the com- laws which underlie
parison function. Each node has three subtrees: left, containing elements be- the complex
phenomena of
fore the left element; middle, containing elements between the left and right nature.
elements; and right, containing elements after the right element. Is it possible Sir Tony Hoare, The
Emperors Old Clothes
to sort faster using ternary trees? (1980 Turing Award
Lecture)
8.2 Searching
In a broad sense, nearly all problems can be thought of as search problems. If
we can define the space of possible solutions, we can search that space to find
a correct solution. For example, to solve the pegboard puzzle (Exploration 5.2)
we enumerate all possible sequences of moves and search that space to find a
winning sequence. For most interesting problems, however, the search space is
far too large to search through all possible solutions.
This section explores a few specific types of search problems. First, we consider
the simple problem of finding an element in a list that satisfies some property.
Then, we consider searching for an item in sorted data. Finally, we consider
the more specific problem of efficiently searching for documents (such as web
168 8.2. Searching
Assuming the matching function has constant running time, the worst case run-
ning time of list-search is linear in the size of the input list. The worst case is
when there is no satisfying element in the list. If the input list has length n, there
are n recursive calls to list-search, each of which involves only constant time
procedures.
Without imposing more structure on the input and comparison function, there
is no more efficient search procedure. In the worst case, we always need to test
every element in the input list before concluding that there is no element that
satisfies the matching function.
1 Ifthe input list contains false as an element, we do not know when the list-search result is false
if it means the element is not in the list or the element whose value is false satisfies the property. An
alternative would be to produce an error if no satisfying element is found, but this is more awkward
when list-search is used by other procedures.
Chapter 8. Sorting and Searching 169
Strings. We use the built-in String datatype to represent documents and target
words. A String is similar to a List, but specialized for representing sequences
of characters. A convenient way to make a String it to just use double quotes
around a sequence of characters. For example, "abcd" evaluates to a String con-
taining four characters.
The String datatype provides procedures for matching, ordering, and converting
between Strings and Lists of characters:
string=?: String String Boolean
Outputs true if the input Strings have exactly the same sequence of charac-
ters, otherwise false.
string<?: String String Boolean
Outputs true if the first input String is lexicographically before the second
input String, otherwise false.
string->list: String List
Outputs a List containing the characters in the input String.
list->string : List String
Outputs a String containing the characters in the input List.
One advantage of using Strings instead of Lists of characters is the built-in pro-
cedures for comparing Strings; we could write similar procedures for Lists of
characters, but lexicographic ordering is somewhat tricky to get right, so it is
better to use the built-in procedures.
Building the index. The entries in the index are Pairs of a word represented
as a string, and a list of locations where that word appears. Each location is a
Pair consisting of a document identifier (for web documents, this is the Uniform
Resource Locator (URL) that is the address of the web page represented as a
string) and a Number identifying the position within the document where the
word appears (we label positions as the number of characters in the document
before this location).
To build the index, we split each document into words and record the position
of each word in the document. The first step is to define a procedure that takes
as input a string representing an entire document, and produces a list of (word
. position) pairs containing one element for each word in the document. We
define a word as a sequence of alphabetic characters; non-alphabetic characters
including spaces, numbers, and punctuation marks separate words and are not
included in the index.
The text-to-word-positions procedure takes a string as input and outputs a list
of word-position pairs corresponding to each word in the input. The inner pro-
cedure, text-to-word-positions-iter, takes three inputs: a list of the characters in
the document, a list of the characters in the current word, and a number repre-
senting the position in the string where the current word starts; it outputs the list
of (word . position) pairs. The value passed in as w can be null, meaning there
is no current word. Otherwise, it is a list of the characters in the current word.
A word starts when the first alphabetic character is found, and continues until
either the first non-alphabetic character or the end of the document. We use the
built-in char-downcase procedure to convert all letters to their lowercase form,
so KING, King, and king all correspond to the same word.
Chapter 8. Sorting and Searching 171
(define (text-to-word-positions s)
(define (text-to-word-positions-iter p w pos)
(if (null? p)
(if (null? w) null (list (cons (list->string w) pos)))
(if (not (char-alphabetic? (car p))) ; finished word
(if (null? w) ; no current word
(text-to-word-positions-iter (cdr p) null (+ pos 1))
(cons (cons (list->string w) pos)
(text-to-word-positions-iter (cdr p) null
(+ pos (list-length w) 1))))
(text-to-word-positions-iter (cdr p)
(list-append w (list (char-downcase (car p))))
pos))))
(text-to-word-positions-iter (string->list s) null 0))
The next step is to build an index from the list of word-position pairs. To enable
fast searching, we store the index in a binary tree sorted by the target word. The
insert-into-index procedure takes as input an index and a word-position pair
and outputs an index consisting of the input index with the input word-position
pair added.
The index is represented as a sorted binary tree where each element is a pair of a
word and a list of the positions where that word appears. Each word should ap-
pear in the tree only once, so if the word-position pair to be added corresponds
to a word that is already in the index, the position is added to the corresponding
list of positions. Otherwise, a new entry is added to the index for the word with
a list of positions containing the position as its only element.
(define (insert-into-index index wp)
(if (null? index)
(make-tree null (cons (car wp) (list (cdr wp))) null)
(if (string=? (car wp) (car (tree-element index)))
(make-tree (tree-left index)
(cons (car (tree-element index))
(list-append (cdr (tree-element index))
(list (cdr wp))))
(tree-right index))
(if (string<? (car wp) (car (tree-element index)))
(make-tree (insert-into-index (tree-left index) wp)
(tree-element index)
(tree-right index))
(make-tree (tree-left index)
(tree-element index)
(insert-into-index (tree-right index) wp))))))
To insert all the (word . position) pairs in a list into the index, we use insert-into-
index to add each pair, passing the resulting index into the next recursive call:
(define (insert-all-wps index wps)
(if (null? wps) index
(insert-all-wps (insert-into-index index (car wps)) (cdr wps))))
To add all the words in a document to the index we use text-to-word-positions
to obtain the list of word-position pairs. Since we want to include the document
172 8.2. Searching
identity in the positions, we use list-map to add the url (a string that identifies
the document location) to the position of each word. Then, we use insert-all-
wps to add all the word-position pairs in this document to the index. The index-
document procedure takes a document identifier and its text as a string, and
produces an index of all words in the document.
(define (index-document url text)
(insert-all-wps
null
(list-map (lambda (wp) (cons (car wp) (cons url (cdr wp))))
(text-to-word-positions text))))
We leave analyzing the running time of index-document as an exercise. The im-
portant point, though, is that it only has to be done once for a given set of doc-
uments. Once the index is built, we can use it to answer any number of search
queries without needing to reconstruct the index.
Merging indexes. Our goal is to produce an index for a set of documents, not
just a single document. So, we need a way to take two indexes produced by
index-document and combine them into a single index. We use this repeat-
edly to create an index of any number of documents. To merge two indexes,
we combine their word occurrences. If a word occurs in both documents, the
word should appear in the merged index with a position list that includes all the
positions in both indexes. If the word occurs in only one of the documents, that
word and its position list should be included in the merged index.
(define (merge-indexes d1 d2)
(define (merge-elements p1 p2)
(if (null? p1) p2
(if (null? p2) p1
(if (string=? (car (car p1)) (car (car p2)))
(cons (cons (car (car p1))
(list-append (cdr (car p1)) (cdr (car p2))))
(merge-elements (cdr p1) (cdr p2)))
(if (string<? (car (car p1)) (car (car p2)))
(cons (car p1) (merge-elements (cdr p1) p2))
(cons (car p2) (merge-elements p1 (cdr p2))))))))
(list-to-sorted-tree
(lambda (e1 e2) (string<? (car e1) (car e2)))
(merge-elements (tree-extract-elements d1)
(tree-extract-elements d2)))))))
To merge the indexes, we first use tree-extract-elements to convert the tree rep-
resentations to lists. The inner merge-elements procedure takes the two lists of
word-position pairs and outputs a single list.
Since the lists are sorted by the target word, we can perform the merge effi-
ciently. If the first words in both lists are the same, we produce a word-position
pair that appends the position lists for the two entries. If they are different, we
use string<? to determine which of the words belongs first, and include that el-
ement in the merged list. This way, the two lists are kept synchronized, so there
is no need to search the lists to see if the same word appears in both lists.
Obtaining documents. To build a useful index for searching, we need some
Chapter 8. Sorting and Searching 173
The read-all-chars procedure takes a port as its input, and produces a list con-
taining all the characters in the document associated with the port:
(define (read-all-chars port)
(let ((c (read-char port)))
(if (eof-object? c) null
(cons c (read-all-chars port)))))
Using these procedures, we define web-get, a procedure that takes as input a
string that represents the URL of some web page, and outputs a string repre-
senting the contents of that page.
(define (web-get url)
(list->string (read-all-chars (get-pure-port (string->url url)))))
2 We use Name to represent sequences of characters in the domain and path names, although the
actual rules for valid names for each of these are different.
174 8.2. Searching
To make it easy to build an index of a set of web pages, we define the index-
pages procedure that takes as input a list of web pages and outputs an index of
the words in those pages. It recurses through the list of pages, indexing each
document, and merging that index with the result of indexing the rest of the
pages in the list.
(define (index-pages p)
(if (null? p) null
(merge-indexes (index-document (car p) (web-get (car p)))
(index-pages (cdr p)))))
We can use this to create an index of any set of web pages. For example, here
we use Jeremy Hyltons collection of the complete works of William Shakespeare
(http://shakespeare.mit.edu) to define shakespeare-index as an index of the words
used in all of Shakespeares plays.
(define shakespeare-index
(index-pages
(list-map
(lambda (play)
(string-append "http://shakespeare.mit.edu/" play "/full.html"))
;; List of plays following the sites naming conventions.
(list "allswell" "asyoulikeit" "comedy errors" "cymbeline" "lll"
"measure" "merry wives" "merchant" "midsummer" "much ado"
"pericles" "taming shrew" "tempest" "troilus cressida" "twelfth night"
"two gentlemen" "winters tale" "1henryiv" "2henryiv" "henryv"
"1henryvi" "2henryvi" "3henryvi" "henryviii" "john" "richardii"
"richardiii" "cleopatra" "coriolanus" "hamlet" "julius caesar" "lear"
"macbeth" "othello" "romeo juliet" "timon" "titus"))))
Building the index takes about two and a half hours on my laptop. It contains
22949 distinct words and over 1.6 million word occurrences. Much of the time
spent building the index is in constructing new lists and trees for every change,
which can be avoided by using the mutable data types we cover in the next chap-
ter. The key idea, though, is that the index only needs to be built once. Once the
documents have been indexed, we can use the index to quickly perform any
search.
Searching. Using an index, searching for pages that use a given word is easy
and efficient. Since the index is a sorted binary tree, we use binary-tree-search
to search for a word in the index:
(define (search-in-index index word)
(binary-tree-search
(lambda (el) (string=? word (car el))) ; first element of (word . position)
(lambda (el) (string<? word (car el)))
index))
As analyzed in the previous section, the expected running time of binary-tree-
search is in (log n) where n is the number of nodes in the input tree.3 The
body of search-in-index applies binary-tree-search to the index. The number of
nodes in the index is the number of distinct words in the indexed documents.
3 Because of the way merge-indexes is defined, we do not actually get this expected running time.
So, the running time of search-in-index scales logarithmically with the number
of distinct words in the indexed documents. Note that the number and size of
the documents does not matter! This is why a search engine such as Google
can respond to a query quickly even though its index contains many billions of
documents.
One issue we should be concerned with is the running time of the procedures
passed into binary-tree-search. Our analysis of binary-tree-search assumes the
equality and comparison functions are constant time procedures. Here, the pro-
cedures as string=? and string<?, which both have worst case running times
that are linear in the length of the input string. As used here, one of the inputs
is the target word. So, the amount of work for each binary-tree-search recursive
call is in (w) where w is the length of word. Thus, the overall running time of
search-in-index is in (w log d) where w is the length of word and d is the num-
ber of words in the index. If we assume all words are of some maximum length,
though, the w term disappears as a constant factor (that is, we are assuming
w < C for some constant C. Thus, the overall running time is in (log d).
Here are some examples:
Exercise 8.14. Define a procedure for finding the longest word in a document.
Analyze the running time of your procedure.
Exercise 8.15. Produce a list of the words in Shakespeares plays sorted by their
length.
Exercise 8.16. [?] Analyze the running time required to build the index.
a. Analyze the running time of the text-to-word-positions procedure. Use n to
represent the number of characters in the input string, and w to represent
the number of distinct words. Be careful to clearly state all assumptions on
which your analysis relies.
b. Analyze the running time of the insert-into-index procedure.
c. Analyze the running time of the index-document procedure.
d. Analyze the running time of the merge-indexes procedure.
e. Analyze the overall running time of the index-pages procedure. Your result
should describe how the running time is impacted by the number of docu-
ments to index, the size of each document, and the number of distinct words.
Exercise 8.17. [?] The search-in-index procedure does not actually have the ex-
pected running time in (log w) (where w is the number of distinct words in
the index) for the Shakespeare index because of the way it is built using merge-
indexes. The problem has to do with the running time of the binary tree on
pathological inputs. Explain why the input to list-to-sorted-tree in the merge-
indexes procedure leads to a binary tree where the running time for searching is
in (w). Modify the merge-indexes definition to avoid this problem and ensure
that searches on the resulting index run in (log w).
In addition to fast indexed search, web search engines have to solve two prob-
lems: (1) find documents to index, and (2) identify the most important docu-
ments that contain a particular search term.
For our Shakespeare index example, we manually found a list of interesting doc-
uments to index. This approach does not scale well to indexing the World Wide
Web where there are trillions of documents and new ones are created all the
time. For this, we need a web crawler. web crawler
A web crawler finds documents on the web. Typical web crawlers start with a set
of seed URLs, and then find more documents to index by following the links on
those pages. This proceeds recursively: the links on each newly discovered page
are added to the set of URLs for the crawler to index. To develop a web crawler,
we need a way to extract the links on a given web page, and to manage the set of
pages to index. The rank assigned
to a document is
a. [?] Define a procedure extract-links that takes as input a string represent- calculated from the
ranks of documents
ing the text of a web page and outputs a list of all the pages linked to from citing it. In
this page. Linked pages can be found by searching for anchor tags on the addition, the rank
web page. An anchor tag has the form:4 <a href=target>. (The text-to-word- of a document is
positions procedure may be a helpful starting point for defining extract-links.) calculated from a
constant
b. Define a procedure crawl-page that takes as input an index and a string rep- representing the
probability that a
resenting a URL. As output, it produces a pair consisting of an index (that is
browser through the
the result of adding all words from the page at the input URL to the input database will
index) and a list of URLs representing all pages linked to by the crawled page. randomly jump to
the document. The
c. [??] Define a procedure crawl-web that takes as input a list of seed URLs and method is
a Number indicating the maximum depth of the crawl. It should output an particularly useful
index of all the words on the web pages at the locations given by the seed in enhancing the
performance of
URLs and any page that can be reached from these seed URLs by following search engine
no more than the maximum depth number of links. results for
hypermedia
databases, such as
For a web search engine to be useful, we dont want to just get a list of all the the world wide web,
pages that contain some target word, we want the list to be sorted according to whose documents
which of those pages are most likely to be interesting. Selecting the best pages have a large
for a given query is a challenging and important problem, and the ability to do variation in quality.
United States Patent
this well is one of the main things that distinguishes web search engines. Many #6,285,999, September
factors are used to rank pages including an analysis of the text on the page itself, 2001. (Inventor:
Lawrence Page,
whether the target word is part of a title, and how recently the page was updated. Assignee: Stanford
University)
The best ways of ranking pages also consider the pages that link to the ranked
page. If many pages link to a given page, it is more likely that the given page is
useful. This property can also be defined recursively: a page is highly ranked if
there are many highly-ranked pages that link to it.
The ranking system used by Google is based on this formula:
R(v)
R(u) = L(v)
v Lu
4 Not all links match this structure exactly, so this may miss some of the links on a page.
178 8.3. Summary
where Lu is the set of web pages that contain links to the target page u and L(v)
is the number of links on the page v (thus, the value of a link from a page con-
taining many links is less than the value of a link from a page containing only a
few links). The value R(u) gives a measure of the ranking of the page identified
by u, where higher values indicate more valuable pages.
The problem with this formula is that is is circular: there is no base case, and no
way to order the web pages to compute the correct rank of each page in turn,
since the rank of each page depends on the rank of the pages that link to it.
relaxation One way to approximate equations like this one is to use relaxation. Relaxation
obtains an approximate solution to some systems of equations by repeatedly
evaluating the equations. To estimate the page ranks for a set of web pages, we
initially assume every page has rank 1 and evaluate R(u) for all the pages (using
the value of 1 as the rank for every other page). Then, re-evaluate the R(u) values
using the resulting ranks. A relaxation keeps repeating until the values change
by less than some threshold amount, but there is no guarantee how quickly this
will happen. For the page ranking evaluation, it may be enough to decide on
some fixed number of iterations.
d. [??] Define a procedure, web-link-graph, that takes as input a set of URLs
and produces as output a graph of the linking structure of those documents.
The linking structure can be represented as a List where each element of the
List is a pair of a URL and a list of all the URLs that include a link to that URL.
The extract-links procedure from the previous exploration will be useful for
determining the link targets of a given URL.
e. [?] Define a procedure that takes as input the output of web-link-graph and
outputs a preliminary ranking of each page that measures the number of
other pages that link to that page.
f. [??] Refine your page ranking procedure to weight links from highly-ranked
pages more heavily in a pages rank by using a algorithm.
g. [? ? ??] Come up with a cool name, set up your search engine as a web ser-
vice, and attract more than 0.001% of all web searches to your site.
8.3 Summary
The focus of Part II has been on predicting properties of procedures, in particu-
lar how their running time scales with the size of their input. This involved many
encounters with the three powerful ideas introduced in Section 1.4: recursive
definitions, universality, and abstraction. The simple Turing Machine model is
a useful abstraction for modeling nearly all conceivable computing machines,
and the few simple operations it defines are enough for a universal computer.
Actual machines use the digital abstraction to allow a continuous range of volt-
ages to represent just two values. The asymptotic operators used to describe
running times are also a kind of abstractionthey allow us to represent the set
of infinitely many different functions with a compact notation.
In Part III, we will see many more recursive definitions, and extend the notion of
recursive definitions to the language interpreter itself. We change the language
evaluation rules themselves, and see how different evaluation rules enable dif-
ferent ways of expressing computations.
Mutation
9
Faced with the choice between changing ones mind and proving that there is no
need to do so, almost everyone gets busy on the proof.
John Kenneth Galbraith
The subset of Scheme we have used until this chapter provides no means to
change the value associated with a name. This enabled very simple evaluation
rules for names, as well as allowing the substitution model of evaluation. Since
the value associated with a name was always the value it was defined as, no com-
plex evaluation rules are needed to determine the value associated with a name.
This chapter introduces special forms known as mutators that allow programs mutators
to change the value in a given place. Introducing mutation does not change
the computations we can expressevery computation that can be expressed
using mutation could also be expressed using the only purely functional subset
of Scheme from Chapter 3. It does, however, make it possible to express cer-
tain computations more efficiently and clearly than could be done without it.
Adding mutation is not free, however; reasoning about the value of expressions
becomes much more complex.
9.1 Assignment
The set! (pronounced set-bang!) special form associates a new value with an
already defined name. The exclamation point at the end of set! follows a naming
convention to indicate that an operation may mutate state. A set expression is
also known as an assignment. It assigns a value to a variable. assignment
Assignments do not produce output values, but are used for their side effects.
They change the value of some state (namely, the value associated with the name
in the set expression), but do not produce an output.
Here is an example use of set!:
180 9.1. Assignment
Begin expression. Since assignments do not evaluate to a value, they are often
used inside a begin expression. A begin expression is a special form that eval-
uates a sequence of expressions in order and evaluates to the value of the last
expression.
The grammar rule for the begin expression is:
Expression :: BeginExpression
BeginExpression :: (begin MoreExpressions Expression)
evaluate each subexpression in order from left to right. The value of the
begin expression is the value of the last subexpression, Expressionk .
The values of all the subexpressions except the last one are ignored; these subex-
pressions are only evaluated for their side effects.
The begin expression must be a special form. It is not possible to define a pro-
cedure that behaves identically to a begin expression since the application rule
does not specify the order in which the operand subexpressions are evaluated.
The definition syntax for procedures includes a hidden begin expression.
(define (Name Parameters) MoreExpressions Expression)
is an abbreviation for:
(define Name
(lambda (Parameters) (begin MoreExpressions Expression)))
The let expression introduced in Section 8.1.1 also includes a hidden begin ex-
pression.
(let ((Name1 Expression1 ) (Name2 Expression2 )
(Namek Expressionk ))
MoreExpressions Expression)
is equivalent to the application expression:
((lambda (Name1 Name2 . . . Namek )
(begin MoreExpressions Expression))
Expression1 Expression2 . . . Expressionk )
Chapter 9. Mutation 181
1 Observant readers should notice that we have already used a few procedures that are not func-
tions including the printing procedures from Section 4.5.1, and random and read-char from the
previous chapter.
182 9.2. Impact of Mutation
Environment A
environment: parameters: ()
x 7
body: (begin (set! counter (+ counter 1))
counter)
Environment B
y 88
Stateful Definition Rule. A definition creates a new place with the defi-
nitions name in the frame associated with the evaluation environment.
The value in the place is value of the definitions expression. If there is al-
ready a place with the name in the current frame, the definition replaces
the old place with a new place and value.
The rule for redefinitions means we could use define in some situations to mean
something similar to set!. The meaning is different, though, since an assignment
finds the place associated with the name and puts a new value in that place.
Evaluating an assignment follows the Stateful Evaluation Rule 2 to find the place
associated with a name. Hence, (define Name Expression) has a different mean-
ing from (set! Name Expression) when there is no place named Name in the
current execution environment. To avoid this confusion, only use define for the
first definition of a name and always use set! when the intent is to change the
value associated with a name.
Application. The final rule that must change because of mutation is the ap-
plication rule for constructed procedures. Instead of using substitution, the
184 9.2. Impact of Mutation
new application rule creates a new environment with a frame containing places
named for the parameters.
Global Environment
bigger
(if (> a b) a b)
The new application rule becomes more interesting when we consider proce-
dures that create new procedures. For example, make-adder takes a number as
input and produces as output a procedure:
(define (make-adder v) (lambda (n) (+ n v)))
Chapter 9. Mutation 185
The environment that results from evaluating (define inc (make-adder 1)) is
shown in Figure 9.3. The name inc has a value that is the procedure resulting
from the application of (make-adder 1). To evaluate the application, we follow
the application rule above and create a new environment containing a frame
with the parameter name, inc, and its associated operand value, 1.
Global Environment
make-adder
inc
The result of the application is the value of evaluating its body in this new en-
vironment. Since the body is a lambda expression, it evaluates to a procedure.
That procedure was created in the execution environment that was created to
evaluate the application of make-adder, hence, its environment pointer points
to the application environment which contains a place named inc holding the
value 1.
Next, consider evaluating (inc 149). Figure 9.4 illustrates the environment for
evaluating the body of the inc procedure. The evaluation creates a new envi-
ronment with a frame containing the place n and its associated value 149. We
evaluate the body of the procedure, (+ n v), in that environment. The value of
n is found in the execution environment. The value of v is not found there, so
evaluation continues by looking in the parent environment. It contains a place
v containing the value 1.
Exercise 9.1. Devise a Scheme expression that could have four possible values,
depending on the order in which application subexpressions are evaluated.
Global Environment
make-adder
inc
Application Env 2
env: params: (n)
n 149 body: (+ n v)
(+ n v)
Exercise 9.3. Draw the environment that results after evaluating the following
expressions, and explain what the value of the final expression is. (Hint: first,
rewrite the let expression as an application.)
> (define (make-applier proc) (lambda (x) (proc x))
> (define p (make-applier (lambda (x) (let ((x 2)) x))))
> (p 4)
and the standard definition of Scheme, a standard cons pair is mutable. But, as we will see later in
the section, mutable pairs cause lots of problems. So, the designers of DrRacket decided for Version
4.0 to make the standard Pair datatype immutable and to provide a MutablePair datatype for use
when mutation is needed.
Chapter 9. Mutation 187
The Void result type indicates that set-mcar! and set-mcdr! produce no output.
Here are some interactions using a MutablePair:
> (define pair (mcons 1 2))
> (set-mcar! pair 3)
> pair
(3 . 2)
> (set-mcdr! pair 4)
> pair
(3 . 4)
The set-mcdr! procedure allows us to create a pair where the second cell of the
pair is itself: (set-mcdr! pair pair). This produces the rather frightening object
shown in Figure 9.5. Every time we apply mcdr to pair, we get the same pair as
the output. Hence, the value of (mcar (mcdr (mcdr (mcdr pair)))) is 3.
We can also create objects that combine mutable and immutable Pairs. For ex-
ample, (define mstruct (cons (mcons 1 2) 3)) defines mstruct as an immutable
Pair containing a MutablePair in its first cell. Since the outer Pair is immutable,
we cannot change the objects in its cells. Thus, the second cell of mstruct always
contains the value 3. We can, however, change the values in the cells of the mu-
table pair in its first cell. For example, (set-mcar! (car mstruct) 7) replaces the
value in the first cell of the MutablePair in the first cell of mstruct.
Mutable Lists. As we used immutable Pairs to build immutable Lists, we can
use MutablePairs to construct MutableLists. A MutableList is either null or a Mu-
tablePair whose second cell contains a MutableList.
The MutableList type is defined by a library. To use it, evaluate the following
expression: (require racket/mpair). All of the examples in this chapter assume
this expression has been evaluated. This library defines the mlist procedure
that is similar to the list procedure, but produces a MutableList instead of an
immutable List. For example, (mlist 1 2 3) produces the structure shown in Fig-
ure 9.6. Each node in the list is a MutablePair, so we can use the set-mcar! and
Many of the list procedures from Chapter 5 can be directly translated to work on
mutable lists. For example, we can define mlist-length as:
(define (mlist-length m)
(if (null? m) 0 (+ 1 (mlist-length (mcdr m)))))
As shown in Exercise 9.4, though, we need to be careful when using mcdr to
recurse through a MutableList since structures created with MutablePairs can
include circular pointers.
Exercise 9.4. What is the value of (mlist-length pair) for the pair shown in Fig-
ure 9.5?
Whereas the functional list-map procedure uses cons to build up the output list,
the imperative mlist-map! procedure uses set-car! to mutate the input lists ele-
ments:
(define (mlist-map! f p)
(if (null? p) (void)
(begin (set-mcar! p (f (mcar p)))
(mlist-map! f (mcdr p)))))
The base case uses (void) to evaluate to no value. Unlike list-map which evalu-
ates to a List, mlist-map! is evaluated for its side effects and produces no output.
Assuming the procedure passed as f has constant running time, the running
time of the mlist-map! procedure is in (n) where n is the number of elements
in the input list. There will be n recursive applications of mlist-map! since each
one passes in a list one element shorter than the input list, and each application
requires constant time. This is asymptotically the same as the list-map proce-
dure, but we would expect the actual running time to be faster since there is no
need to construct a new list.
The memory consumed is asymptotically different. The list-map procedure al-
locates n new cons cells, so it requires memory in (n) where n is the number of
elements in the input list. The mlist-map! procedure is tail recursive (so no stack
needs to be maintained) and does not allocate any new cons cells, so it requires
constant memory.
>a
{1 2 3 4}
The value of a still includes the initial 1. There is no way for the mlist-filter! pro-
cedure to remove the first element of the list: the set-mcar! and set-mcdr! proce-
dures only enable us to change what the mutable pairs components contain.
To avoid this, mlist-filter! should be used with set! to assign the variable to the
resulting mutable list:
(set! a (mlist-filter! (lambda (x) (> x 1)) a))
3 The mappend! library procedure in DrRacket takes a different approach: when the first input
is null it produces the value of the second list as output in this case. This has unexpected behavior
when an expression like (append! a b) is evaluated where the value of a is null since the value of a is
not modified.
Chapter 9. Mutation 191
The value of m2 was defined as {4 5 6}, and no expressions since then explicitly
modified m2. But, the value of m2 has still changed! It changed because after
evaluating (mlist-append! m1 m2) the m1 object shares cells with m2. Thus,
when the mlist-filter! application changes the value of m1, it also changes the
value of m2 to {4 6}.
The built-in procedure eq? takes as input any two objects and outputs a Boolean.
The result is true if and only if the inputs are the same object. For example, (eq?
3 3) evaluates to true but (eq? (mcons 1 2) (mcons 1 2)) evaluates to false. Even
though the input pairs have the same value, they are different objectsmutating
one of the pairs does not effect the value of the other pair.
For the earlier mlist-append! example, (eq? m1 m2) evaluates to false since m1
and m2 do not refer to the same object. But, (eq? (mcdr m1) m2) evaluates to
true since the second cell of m1 points to the same object as m2. Evaluating
(set-mcar! m2 3) changes the value of both m1 and m2 since the modified cell is
common to both structures.
Exercise 9.7. [?] Define a procedure mlist-truncate! that takes as input a Mu-
tableList and modifies the list by removing the last element in the list. Specify
carefully the requirements for the input list to your procedure.
If you steal property,
you must report its
Exercise 9.8. [?] Define a procedure mlist-make-circular! that takes as input a fair market value in
MutableList and modifies the list to be a circular list containing all the elements your income in the
in the original list. For example, (mlist-make-circular! (mlist 3)) should produce year you steal it
the same structure as the circular pair shown in Figure 9.5. unless in the same
year, you return it to
its rightful owner.
Your Federal Income
Exercise 9.9. [?] Define an imperative-style procedure, mlist-reverse!, that re- Tax, IRS Publication
verses the elements of a list. Is it possible to implement a mlist-reverse! pro- 17, 2009.
cedure that is asymptotically faster than the list-reverse procedure from Exam-
ple 5.4?
Exercise 9.10. [??] Define a procedure mlist-aliases? that takes as input two
mutable lists and outputs true if and only if there are any mcons cells shared
between the two lists.
while loop A common control structure in imperative programming is a while loop. A while
loop has a test condition and a body. The test condition is a predicate. If it
evaluates to true, the while loop body is executed. Then, the test condition is
evaluated again. The while loop continues to execute until the test condition
evaluates to false.
We can define while as a procedure that takes as input two procedures, a test
procedure and a body procedure, each of which take no parameters. Even though
the test and body procedures take no parameters, they need to be procedures in-
stead of expressions, since every iteration of the loop should re-evaluate the test
and body expressions of the passed procedures.
(define (while test body)
(if (test)
(begin (body) (while test body))
(void))) ; no result value
We can use the while procedure to implement Fibonacci similarly to the fast-
fibo procedure:
(define (fibo-while n)
(let ((a 1) (b 1))
(while (lambda () (> n 2))
(lambda () (set! b (+ a b))
(set! a ( b a))
(set! n ( n 1))))
b))
The final value of b is the result of the fibo-while procedure. In each iteration, the
body procedure is applied, updating the values of a and b to the next Fibonacci
numbers.
The value assigned to a is computed as ( b a) instead of b. The reason for this
is the previous assignment expression has already changed the value of b, by
adding a to it. Since the next value of a should be the old value of b, we can
find the necessary value by subtracting a. The fact that the value of a variable
can change depending on when it is used often makes imperative programming
trickier than functional programming.
An alternative approach, which would save the need to do subtraction, is to store
the old value in a temporary value:
(lambda ()
(let ((oldb b))
(set! b (+ a b))
(set! a oldb)
(set! n ( n 1))))
Many programming languages designed to support imperative programming
provide control constructs similar to the while procedure defined above. For ex-
ample, here is a version of the procedure in the Python programming language:
while n > 2:
a, b = b, a + b
n=n1
return b
We will use Python starting in Chapter 11, although you can probably guess what
most of this procedure means without knowing Python.
The most interesting statement is the double assignment: a, b = b, a + b. This
assigns the new value of a to the old value of b, and the new value of b to the sum
of the old values of a and b. Without the double assignment operator, it would
be necessary to store the old value of b in a new variable so it can be assigned to
a after updating b to the new value.
Exercise 9.11. Define the mlist-map! example from the previous section using
while.
Exercise 9.12. Another common imperative programming structure is a repeat-
until loop. Define a repeat-until procedure that takes two inputs, a body proce- repeat-until
dure and a test procedure. The procedure should evaluate the body procedure
repeatedly, until the test procedure evaluates to a true value. For example, using
repeat-until we could define factorial as:
(define (factorial n)
(let ((fact 1))
(repeat-until
(lambda () (set! fact ( fact n)) (set! n ( n 1)))
(lambda () (< n 1)))
fact))
Exercise 9.13. [??] Improve the efficiency of the indexing procedures from Sec-
tion 8.2.3 by using mutation. Start by defining a mutable binary tree abstraction,
and then use this and the MutableList data type to implement an imperative-
style insert-into-index! procedure that mutates the input index by adding a new
word-position pair to it. Then, define an efficient merge-index! procedure that
takes two mutable indexes as its inputs and modifies the first index to incor-
porate all word occurrences in the second index. Analyze the impact of your
changes on the running time of indexing a collection of documents.
9.5 Summary
Adding the ability to change the value associated with a name complicates our
evaluation rules, but enables simpler and more efficient solutions to many prob-
lems. Mutation allows us to efficiently manipulate larger data structures since it
is not necessary to copy the data structure to make changes to it.
Once we add assignment to our language, the order in which things happen
affects the value of some expressions. Instead of evaluating expressions using
substitution, we now need to always evaluate an expression in a particular exe-
cution environment.
194 9.5. Summary
The problem with mutation is that it makes it much tougher to reason about
the meaning of an expression. In the next chapter, we introduce a new kind of
abstraction that packages procedures with the state they manipulate. This helps
manage some of the complexity resulting from mutation by limiting the places
where data may be accessed and modified.
10 Objects
It was amazing to me, and it is still amazing, that people could not imagine what the
psychological difference would be to have an interactive terminal. You can talk about it on a
blackboard until you are blue in the face, and people would say, Oh, yes, but why do you need
that?. . . We used to try to think of all these analogies, like describing it in terms of the
difference between mailing a letter to your mother and getting on the telephone. To this day I
can still remember people only realizing when they saw a real demo, say, Hey, it talks back.
Wow! You just type that and you got an answer.
Fernando Corbato (who worked on Whirlwind in the 1950s)
Charles Babbage Institute interview, 1989
vided by DrRacket. The MzScheme language does provide additional constructs for supporting ob-
jects, but we do not cover them in this book.
196 10.1. Packaging Procedures and State
10.1.1 Encapsulation
It would be more useful to package the counter variable with the procedure that
manipulates it. Then we could create as many counters as we want, each with
its own counter variable to manipulate.
The Stateful Application Rule (from Section 9.2.2) suggests a way to do this: eval-
uating an application creates a new environment, so a counter variable defined
an the application environment is only visible through body of the created pro-
cedure.
The make-counter procedure creates a counter object that packages the count
variable with the procedure that increases its value:
(define (make-counter)
((lambda (count)
(lambda () (set! count (+ 1 count)) count))
0))
Figure 10.1 depicts the environment after creating two counter objects and ap-
plying one of them.
Global Environment
make-counter
counter1
counter2
env: params: ()
Application Env 1 Application Env 2 body:
count 1 count 0 (let ((count 0)
(lambda ()
(set! count (+ 1 count))
count))
env: params: () env: params: ()
body: body:
(begin (set! count (+ 1 count) (begin (set! count (+ 1 count)
count)) count))
10.1.2 Messages
The object produced by make-counter is limited to only one behavior: every
time it is applied the associated count variable is increased by one and the new
value is output. To produce more useful objects, we need a way to combine state
with multiple behaviors.
For example, we might want a counter that can also return the current count
and reset the count. We do this by adding a message parameter to the procedure
produced by make-counter:
(define (make-counter)
(let ((count 0))
(lambda (message)
(if (eq? message get-count) count
(if (eq? message reset!) (set! count 0)
(if (eq? message next!) (set! count (+ 1 count))
(error "Unrecognized message")))))))
Like the earlier make-counter, this procedure produces a procedure with an en-
vironment containing a frame with a place named count. The produced proce-
dure takes a message parameter and selects different behavior depending on the
input message.
The message parameter is a Symbol. A Symbol is a sequence of characters pre-
ceded by a quote character such as next!. Two Symbols are equal, as determined
by the eq? procedure, if their sequences of characters are identical. The running
time of the eq? procedure on symbol type inputs is constant; it does not increase
with the length of the symbols since the symbols can be represented internally
as small numbers and compared quickly using number equality. This makes
198 10.1. Packaging Procedures and State
symbols a more efficient way of selecting object behaviors than Strings, and a
more memorable way to select behaviors than using Numbers.
Here are some sample interactions using the counter object:
> (define counter (make-counter))
> (counter next!)
> (counter get-count)
1
> (counter previous!)
Unrecognized message
Conditional expressions. For objects with many behaviors, the nested if ex-
pressions can get quite cumbersome. Scheme provides a compact conditional
expression for combining many if expressions into one smaller expression:
Expression :: CondExpression
CondExpression:: (cond CondClauseList)
CondClauseList :: CondClause CondClauseList
CondClauseList :: e
CondClause :: (Expression predicate Expressionconsequent )
This evaluation rule is recursive since the transformed expression still includes a
conditional expression, but uses the empty conditional with no value as its base
case.
The conditional expression can be used to define make-counter more clearly
than the nested if expressions:
(define (make-counter)
(let ((count 0))
(lambda (message)
(cond ((eq? message get-count) count)
((eq? message reset!) (set! count 0))
((eq? message next!) (set! count (+ 1 count)))
(true (error "Unrecognized message"))))))
For linguistic convenience, Scheme provides a special syntax else for use in con-
ditional expressions. When used as the predicate in the last conditional clause
it means the same thing as true. So, we could write the last clause equivalently
as (else (error "Unrecognized message")).
Sending messages. A more natural way to interact with objects is to define
a generic procedure that takes an object and a message as its parameters, and
send the message to the object.
Chapter 10. Objects 199
The name following the dot is bound to all the remaining operands combined
into a list. This means the defined procedure can be applied to n or more operands
where n is the number of names in Parameters. If there are only n operand ex-
pressions, the value bound to NameRest is null. If there are n + k operand expres-
sions, the value bound to NameRest is a list containing the values of the last k
operand expressions.
To apply the procedure we use the built-in apply procedure which takes two
inputs, a Procedure and a List. It applies the procedure to the values in the list,
extracting them from the list as each operand in order.
(define (ask object message . args)
(apply (object message) args))
We can use the new ask procedure with two or more parameters to invoke meth-
ods with any number of arguments (e.g., > (ask counter adjust! 5)).
The state variables that are part of an object are called instance variables. The instance variables
200 10.2. Inheritance
instance variables are stored in places that are part of the application environ-
ment for the object. This means they are encapsulated with the object and can
only be accessed through the object. An object produced by (make-counter) de-
fines a single instance variable, count.
methods The procedures that are part of an object are called methods. Methods may pro-
vide information about the state of an object (we call these observers) or modify
the state of an object (we call these mutators). An object produced by (make-
counter) provides three methods: reset! (a mutator), next! (a mutator), and get-
count (an observer).
invoke An object is manipulated using the objects methods. We invoke a method on an
object by sending the object a message. This is analogous to applying a proce-
dure.
class A class is a kind of object. Classes are similar to data types. They define a set of
possible values and operations (methods in the object terminology) for manip-
ulating those values. We also need procedures for creating new objects, such as
constructors the make-counter procedure above. We call these constructors. By convention,
we call the constructor for a class make-<class> where <class> is the name of
the class. Hence, an instance of the counter class is the result produced when
the make-counter procedure is applied.
Exercise 10.2. [?] Define a variable-counter object that provides these meth-
ods:
make-variable-counter: Number VariableCounter
Creates a variable-counter object with an initial counter value of 0 and an
initial increment value given by the parameter.
set-increment!: Number Void
Sets the increment amount for this counter to the input value.
next!: Void Void
Adds the increment amount to the value of the counter.
get-count: Void Number
Outputs the current value of the counter.
Here are some sample interactions using a variable-counter object:
> (define vcounter (make-variable-counter 1))
> (ask vcounter next!)
> (ask vcounter set-increment! 2)
> (ask vcounter next!)
> (ask vcounter get-count)
3
10.2 Inheritance
Objects are particularly well-suited to programs that involve modeling real or
imaginary worlds such as graphical user interfaces (modeling windows, files,
Chapter 10. Objects 201
to have more than one superclass. This can be confusing, though. If a class has two superclasses and
both define methods with the same name, it may be ambiguous which of the methods is used when
it is invoked on an object of the subclass. In our object system, a class may have only one superclass.
202 10.2. Inheritance
Hence, if the sim-object class defines a get-name method, when the get-name
method is invoked on a student object, the implementation of get-name in the
sim-object class will be used (as long as neither person nor movable-object de-
fines its own get-name method).
When one class implementation uses the methods from another class we say the
inherits subclass inherits from the superclass. Inheritance is a powerful way to obtain
many different objects with a small amount of code.
we need to also provide a set-count! method for setting the value of the counter
to an arbitrary value. For reasons that will become clear later, we should use set-
count! everywhere the value of the count variable is changed instead of setting
it directly:
(define (make-counter)
(let ((count 0))
(lambda (message)
(cond
((eq? message get-count) (lambda (self ) count))
((eq? message set-count!) (lambda (self val) (set! count val)))
((eq? message reset!) (lambda (self ) (ask self set-count! 0)))
((eq? message next!)
(lambda (self ) (ask self set-count! (+ 1 (ask self current)))))
(else (error "Unrecognized message"))))))
Previously, we defined make-adjustable-counter by repeating all the code from
make-counter and adding an adjust! method. With inheritance, we can define
make-adjustable-counter as a subclass of make-counter without repeating any
code:
(define (make-adjustable-counter)
(make-sub-object
(make-counter)
(lambda (message)
(cond
((eq? message adjust!)
(lambda (self val)
(ask self set-count! (+ (ask self get-count) val))))
(else false)))))
We use make-sub-object to create an object that inherits the behaviors from one
class, and extends those behaviors by defining new methods in the subclass im-
plementation.
The new adjust! method takes one Number parameter (in addition to the self
object that is passed to every method) and adds that number to the current
counter value. It cannot use (set! count (+ count val)) directly, though, since the
count variable is defined in the application environment of its superclass object
and is not visible within adjustable-counter. Hence, it accesses the counter us-
ing the set-count! and get-count methods provided by the superclass.
Suppose we create an adjustable-counter object:
(define acounter (make-adjustable-counter))
Consider what happens when (ask acounter adjust! 3) is evaluated. The acounter
object is the result of the application of make-sub-object which is the procedure,
(lambda (message)
(let ((method (subproc message)))
(if method method (super message)))))
where super is the counter object resulting from evaluating (make-counter) and
subproc is the procedure created by the lambda expression in make-adjustable-
204 10.2. Inheritance
counter. The body of ask evaluates (object message) to find the method associ-
ated with the input message, in this case adjust!. The acounter object takes the
message input and evaluates the let expression:
(let ((method (subproc message))) . . .)
The result of applying subproc to message is the adjust! procedure defined by
make-adjustable-counter:
(lambda (self val)
(ask self set-count! (+ (ask self get-count) val)))
Since this is not false, the predicate of the if expression is non-false and the value
of the consequent expression, method, is the result of the procedure application.
The ask procedure uses apply to apply this procedure to the object and args
parameters. The object is the acounter object, and the args is the list of the extra
parameters, in this case (3).
Thus, the adjust! method is applied to the acounter object and 3. The body of
the adjust! method uses ask to invoke the set-count! method on the self object.
As with the first invocation, the body of ask evaluates (object message) to find
the method. In this case, the subclass implementation provides no set-count!
method so the result of (subproc message) in the application of the subclass ob-
ject is false. Hence, the alternate expression is evaluated: (super message). This
evaluates to the method associated with the set-count! message in the super-
class. The ask body will apply this method to the self object, setting the value of
the counter to the new value.
We can define new classes by defining subclasses of previously defined classes.
For example, reversible-counter inherits from adjustable-counter:
(define (make-reversible-counter)
(make-subobject
(make-adjustable-counter)
(lambda (message)
(cond
((eq? message previous!) (lambda (self ) (ask self adjust! 1)))
(else false)))))
The reversible-counter object defines the previous! method which provides a
new behavior. If the message to a adjustable-counter object is not previous!,
the method from its superclass, adjustable-counter is used. Within the previous!
method we use ask to invoke the adjust! method on the self object. Since the
subclass implementation does not provide an adjust! method, this results in the
superclass method being applied.
instead of setting the counter to the new value, it produces an error message and
maintains the counter at zero. We do this by overriding the set-count! method,
replacing the superclass implementation of the method with a new implemen-
tation.
(define (make-positive-counter)
(make-subobject
(make-reversible-counter)
(lambda (message)
(cond
((eq? message set-count!)
(lambda (self val) (if (< val 0) (error "Negative count")
. . .)))
(else false)))))
What should go in place of the . . .? When the value to set the count to is not
negative, what should happen is the count is set as it would be by the super-
class set-count! method. In the positive-counter code though, there is no way
to access the count variable since it is in the superclass procedures application
environment. There is also no way to invoke the superclass set-count! method
since it has been overridden by positive-counter.
The solution is to provide a way for the subclass object to obtain its superclass
object. We can do this by adding a get-super method to the object produced by
make-sub-object:
(define (make-sub-object super subproc)
(lambda (message)
(if (eq? message get-super)
(lambda (self ) super)
(let ((method (subproc message)))
(if method method (super message))))))
Thus, when an object produced by make-sub-object is passed the get-super mes-
sage it returns a method that produces the super object. The rest of the proce-
dure is the same as before, so for every other message it behaves like the earlier
make-sub-object procedure.
With the get-super method we can define the set-count! method for positive-
counter, replacing the . . . with:
(ask (ask self get-super) set-count! val))
Figure 10.3 shows the subclasses that inherit from counter and the methods they
define or override.
Consider these sample interactions with a positive-counter object:
> (define poscount (make-positive-counter))
> (ask poscount next!)
> (ask poscount previous!)
> (ask poscount previous!)
Negative count
> (ask poscount get-count)
0
206 10.2. Inheritance
counter
get-count!
set-count!
reset!
next!
adjustable-counter
adjust!
reversible-counter
previous!
positive-counter
set-count!
For the first ask application, the next! method is invoked on a positive-counter
object. Since the positive-counter class does not define a next! method, the mes-
sage is sent to the superclass, reversible-counter. The reversible-counter imple-
mentation also does not define a next! method, so the message is passed up to its
superclass, adjustable-counter. This class also does not define a next! method,
so the message is passed up to its superclass, counter. The counter class defines
a next! method, so that method is used.
For the next ask, the previous! method is invoked. Since the positive-counter
class does not define a previous! method, the message is sent to the superclass.
The superclass, reversible-counter, defines a previous! method. Its implemen-
tation involves an invocation of the adjust! method: (ask self adjust! 1). This
invocation is done on the self object, which is an instance of the positive-counter
class. Hence, the adjust! method is found from the positive-counter class imple-
mentation. This is the method that overrides the adjust! method defined by the
adjustable-counter class. Hence, the second invocation of previous! produces
the Negative count error and does not adjust the count to 1.
The property this object system has where the method invoked depends on
dynamic dispatch the object is known as dynamic dispatch. The method used for an invocation
depends on the self object. In this case, for example, it means that when we
inspect the implementation of the previous! method in the reversible-counter
class by itself it is not possible to determine what procedure will be applied for
the method invocation, (ask self adjust! 1). It depends on the actual self ob-
ject: if it is a positive-counter object, the adjust! method defined by positive-
counter is used; if it is a reversible-counter object, the adjust! method defined by
adjustable-counter class (the superclass of reversible-counter) is used.
Dynamic dispatch provides for a great deal of expressiveness. It enables us to
use the same code to produce many different behaviors by overriding methods
in subclasses. This is very useful, but also very dangerous it makes it impos-
sible to reason about what a given procedure does, without knowing about all
possible subclasses. For example, we cannot make any claims about what the
previous! method in reversible-counter actually does without knowing what the
adjust! method does in all subclasses of reversible-counter.
Chapter 10. Objects 207
The value of encapsulation and inheritance increases as programs get more com-
plex. Programming with objects allows a programmer to manage complexity by
hiding the details of how objects are implemented from what those objects rep-
resent and do.
Exercise 10.3. Define a countdown class that simulates a rocket launch count-
down: it starts at some initial value, and counts down to zero, at which point the
rocket is launched. Can you implement countdown as a subclass of counter?
Exercise 10.4. Define the variable-counter object from Exercise 10.2 as a sub-
class of counter.
ory that was possible with available technologies based on storing electrostatic
charges in vacuum tubes. Jay Forrester invented a much faster memory based
known as magnetic-core memory. Magnetic-core memory stores a bit using
magnetic polarity.
The interactiveness of the Whirlwind computer opened up many new possibili-
ties for computing. Shortly after the first Whirlwind computer, Ken Olson led an
effort to build a version of the computer using transistors. The successor to this
machine became the TX-2, and Ken Olsen went on to found Digital Equipment
Corporation (DEC) which pioneered the widespread use of moderately priced
computers in science and industry. DEC was very successful in the 1970s and
1980s, but suffered a long decline before eventually being bought by Compaq.
Ivan Sutherland, then a graduate student at MIT, had an opportunity to use the
TX-2 machine. He developed a program called Sketchpad that was the first pro-
gram to have an interactive graphical interface. Sketchpad allowed users to draw
and manipulate objects on the screen using a light pen. It was designed around
objects and operations on those objects:3
In the process of making the Sketchpad system operate, a few very gen-
eral functions were developed which make no reference at all to the spe-
cific types of entities on which they operate. These general functions give
the Sketchpad system the ability to operate on a wide range of problems.
The motivation for making the functions as general as possible came from
the desire to get as much result as possible from the programming effort
involved. . . Each of the general functions implemented in the Sketchpad
system abstracts, in some sense, some common property of pictures inde-
pendent of the specific subject matter of the pictures themselves.
Components in
Sketchpad Sketchpad was a great influence on Douglas Engelbart who developed a research
program around a vision of using computers interactively to enhance human in-
tellect. In what has become known as the mother of all demos, Engelbart and
his colleagues demonstrated a networked, graphical, interactive computing sys-
tem to the general public for the first time in 1968. With Bill English, Engelbard
also invented the computer mouse.
Sketchpad also influenced Alan Kay in developing object-oriented programming.
The first language to include support for objects was the Simula programming
language, developed in Norway in the 1960s by Kristen Nygaard and Ole Johan
Dahl. Simula was designed as a language for implementing simulations. It pro-
vided mechanisms for packaging data and procedures, and for implementing
subclasses using inheritance.
In 1966, Alan Kay entered graduate school at the University of Utah, where Ivan
Sutherland was then a professor. Heres how he describes his first assignment:4
Head whirling, I found my desk. On it was a pile of tapes and listings,
and a note: This is the Algol for the 1108. It doesnt work. Please make it
work. The latest graduate student gets the latest dirty task. The documen-
tation was incomprehensible. Supposedly, this was the Case-Western Re-
serve 1107 Algolbut it had been doctored to make a language called Sim-
ula; the documentation read like Norwegian transliterated into English,
3 Ivan Sutherland, Sketchpad: a Man-Machine Graphical Communication System, 1963
4 Alan Kay, The Early History of Smalltalk, 1993
Chapter 10. Objects 209
which in fact it was. There were uses of words like activity and process that
didnt seem to coincide with normal English usage. Finally, another grad-
uate student and I unrolled the program listing 80 feet down the hall and
crawled over it yelling discoveries to each other. The weirdest part was the
storage allocator, which did not obey a stack discipline as was usual for Al-
gol. A few days later, that provided the clue. What Simula was allocating
were structures very much like the instances of Sketchpad. There were de-
scriptions that acted like masters and they could create instances, each of
which was an independent entity. . . .
This was the big hit, and Ive not been the same since. . . For the first time
I thought of the whole as the entire computer and wondered why anyone
would want to divide it up into weaker things called data structures and
procedures. Why not divide it up into little computers, as time sharing was
starting to? But not in dozens. Why not thousands of them, each simulat-
ing a useful structure?
Alan Kay
Alan Kay went on to design the language Smalltalk, which became the first widely
used object-oriented language. Smalltalk was developed as part of a project at
XEROXs Palo Alto Research Center to develop a hand-held computer that could
be used as a learning environment by children.
In Smalltalk, everything is an object, and all computation is done by sending
messages to objects. For example, in Smalltalk one computes (+ 1 2) by send-
ing the message + 2 to the object 1. Here is Smalltalk code for implementing a
counter object: Dont worry about
what anybody else is
class name counter going to do. The best
instance variable names count way to predict the
new count < 0 future is to invent it.
Really smart people
next count < count + 1 with reasonable
current count funding can do just
about anything that
The new method is a constructor analogous to make-counter. The count in- doesnt violate too
many of Newtons
stance variable stores the current value of the counter, and the next method up- Laws!
dates the counter value by sending the message + 1 to the count object. Alan Kay
Nearly all widely-used languages today provide built-in support for some form
of object-oriented programming. For example, here is how a counter object
could be defined in Python:
class counter:
def init (self): self. count = 0
def rest(self): self. count = 0
def next(self): self. count = self. count + 1
def current(self): return self. count
10.4 Summary
An object is an entity that packages state with procedures that manipulate that
state. By packaging state and procedures together, we can encapsulate state in
ways that enable more elegant and robust programs.
210 10.4. Summary
Dynabook Images
From Alan Kay, A Personal Computer for Children of All Ages, 1972.
11
Interpreters
When I use a word, Humpty Dumpty said, in a rather scornful tone, it means just what I choose it to
mean - nothing more nor less.
The question is, said Alice, whether you can make words mean so many different things.
Lewis Carroll, Through the Looking Glass
Languages are powerful tools for thinking. Different languages encourage dif-
ferent ways of thinking and lead to different thoughts. Hence, inventing new
languages is a powerful way for solving problems. We can solve a problem by
designing a language in which it is easy to express a solution and implementing
an interpreter for that language.
An interpreter is just a program. As input, it takes a specification of a program in interpreter
some language. As output, it produces the output of the input program. Imple-
menting an interpreter further blurs the line between data and programs, that
we first crossed in Chapter 3 by passing procedures as parameters and return-
ing new procedures as results. Programs are just data input for the interpreter
program. The interpreter determines the meaning of the program.
To implement an interpreter for a given target language we need to:
1. Implement a parser that takes as input a string representation of a pro- parser
gram in the target language and produces a structural parse of the input
program. The parser should break the input string into its language com-
ponents, and form a parse tree data structure that represents the input text
in a structural way. Section 11.2 describes our parser implementation.
2. Implement an evaluator that takes as input a structural parse of an input evaluator
program, and evaluates that program. The evaluator should implement
the target languages evaluation rules. Section 11.3 describes our evaluator.
Our target language is a simple subset of Scheme we call Charme.1 The Charme
language is very simple, yet is powerful enough to express all computations (that
is, it is a universal programming language). Its evaluation rules are a subset of
1 The original name of Scheme was Schemer, a successor to the languages Planner and Con-
niver. Because the computer on which Schemer was implemented only allowed six-letter file
names, its name was shortened to Scheme. In that spirit, we name our snake-charming language,
Charmer and shorten it to Charme. Depending on the programmers state of mind, the language
name can be pronounced either charm or char me.
212 11.1. Python
the stateful evaluation rules for Scheme. The full grammar and evaluation rules
for Charme are given in Section 11.3. The evaluator implements those evalua-
tion rules.
Section 11.4 illustrates how changing the evaluation rules of our interpreter opens
up new ways of programming.
11.1 Python
We could implement a Charme interpreter using Scheme or any other univer-
sal programming language, but implement it using the programming language
Python. Python is a popular programming language initially designed by Guido
van Rossum in 1991.2 Python is freely available from http://www.python.org.
We use Python instead of Scheme to implement our Charme interpreter for a few
reasons. The first reason is pedagogical: it is instructive to learn new languages.
As Dijkstras quote at the beginning of this chapter observes, the languages we
use have a profound effect on how we think. This is true for natural languages,
but also true for programming languages. Different languages make different
styles of programming more convenient, and it is important for every program-
mer to be familiar with several different styles of programming. All of the major
concepts we have covered so far apply to Python nearly identically to how they
apply to Scheme, but seeing them in the context of a different language should
make it clearer what the fundamental concepts are and what are artifacts of a
particular programming language.
Another reason for using Python is that it provides some features that enhance
expressiveness that are not available in Scheme. These include built-in support
for objects and imperative control structures. Python is also well-supported by
most web servers (including Apache), and is widely used to develop dynamic
web applications.
The grammar for Python is quite different from the Scheme grammar, so Python
programs look very different from Scheme programs. The evaluation rules, how-
ever, are quite similar to the evaluation rules for Scheme. This chapter does
not describe the entire Python language, but introduces the grammar rules and
evaluation rules for the most important Python constructs as we use them to
implement the Charme interpreter.
Like Scheme, Python is a universal programming language. Both languages can
express all mechanical computations. For any computation we can express in
Scheme, there is a Python program that defines the same computation. Con-
versely, every Python program has an equivalent Scheme program.
One piece of evidence that every Scheme program has an equivalent Python
program is the interpreter we develop in this chapter. Since we can implement
an interpreter for a Scheme-like language in Python, we know we can express
every computation that can be expressed by a program in that language with an
equivalent Python program: the Charme interpreter with the Charme program
as its input.
Tokenizing. We introduce Python using one of the procedures in our inter-
preter implementation. We divide the job of parsing into two procedures that
are combined to solve the problem of transforming an input string into a list de-
scribing the input programs structure. The first part is the tokenizer. It takes as tokenizer
input a string representing a Charme program, and outputs a list of the tokens
in that string.
A token is an indivisible syntactic unit. For example, the Charme expression, token
(define square (lambda (x) ( x x))), contains 15 tokens: (, define, square, (,
lambda, (, x, ), (, *, x, x, ), ), and ). Tokens are separated by whitespace (spaces,
tabs, and newlines). Punctuation marks such as the left and right parentheses
are tokens by themselves.
The tokenize procedure below takes as input a string s in the Charme target lan-
guage, and produces as output a list of the tokens in s. We describe the Python
language constructs it uses next.
Block :: Statement
Block :: <newline> indented(Statements)
Statements :: Statement <newline> MoreStatements
MoreStatements :: Statement <newline> MoreStatements
MoreStatements :: e
Unlike in Scheme, whitespace (such as new lines) has meaning in Python. State-
ments cannot be separated into multiple lines, and only one statement may ap-
pear on a single line. Indentation within a line also matters. Instead of using
parentheses to provide code structure, Python uses the indentation to group
statements into blocks. The Python interpreter reports an error if the inden-
tation does not match the logical structure of the code.
Since whitespace matters in Python, we include newlines (<newline>) and in-
dentation in our grammar. We use indented(elements) to indicate that the ele-
ments are indented. For example, the rule for Block is a newline, followed by one
or more statements. The statements are all indented one level inside the blocks
indentation. The block ends when the indenting returns to the outer level.
The evaluation rule for a procedure definition is similar to the rule for evaluating
a procedure definition in Scheme.
Python Procedure Definition. The procedure definition,
def Name (Parameters): Block
The procedure definition, def tokenize(s): ..., defines a procedure named tokenize
that takes a single parameter, s.
Assignment. The body of the procedure uses several different types of Python
statements. Following Pythons more imperative style, five of the statements in
tokenize are assignment statements including the first two statements. For ex-
ample, the assignment statement, tokens = [] assigns the value [] (the empty list)
to the name tokens.
The grammar for the assignment statement is:
Statement :: AssignmentStatement
AssignmentStatement :: Target = Expression
Target :: Name
For now, we use only a Name as the left side of an assignment, but since other
constructs can appear on the left side of an assignment statement, we introduce
the nonterminal Target for which additional rules can be defined to encompass
other possible assignees. Anything that can hold a value (such as an element of
a list) can be the target of an assignment.
Chapter 11. Interpreters 215
PrimaryExpression :: Literal
PrimaryExpression :: Name
PrimaryExpression :: ( Expression )
216 11.1. Python
The last rule allows expressions to be grouped explicitly using parentheses. For
example, (3 + 4) * 5 is parsed as the PrimaryExpression, (3 + 4), times 5, so evalu-
ates to 35; without the parentheses, 3 + 4 * 5 is parsed as 3 plus the MultExpres-
sion, 4 * 5, so evaluates to 23.
A PrimaryExpression can be a Literal, such as a number. Numbers in Python are
similar (but not identical) to numbers in Scheme.
A PrimaryExpression can also be a name, similar to names in Scheme. The eval-
uation rule for a name in Python is similar to the stateful rule for evaluating a
name in Scheme3 .
Exercise 11.1. Draw the parse tree for each of the following Python expressions
and provide the value of each expression.
a. 1 + 2 + 3 * 4
b. 3 > 2 + 2
c. 3 * 6 >= 15 == 12
d. (3 * 6 >= 15) == True
a = [1, 2, 3]
a[0] 1
a[1+1] 3
a[3] IndexError: list index out of range
3 There are some subtle differences and complexities (see Section 4.1 of the Python Reference
Subscript expressions with ranges evaluate to a list containing the elements be-
tween the low bound and the high bound. If the low bound is missing, the low
bound is the beginning of the list. If the high bound is missing, the high bound
is the end of the list. For example,
a = [1, 2, 3]
a[:1] [1]
a[1:] [2, 3]
a[42:3] [3]
a[:] [1, 2, 3]
Assignments using ranges as targets can add elements to the list as well as chang-
ing the values of existing elements:
a = [1, 2, 3]
a[0] = 7
a [7, 2, 3]
a[1:4] = [4, 5, 6]
a [7, 4, 5, 6]
a[1:] = [6]
a [7, 6]
strings which are mutable, the Python str datatype is immutable. Once a string
is created its value cannot change. This means all the string methods that seem
to change the string values actually return new strings (for example, capitalize()
returns a copy of the string with its first letter capitalized).
Strings can be enclosed in single quotes (e.g., 'hello'), double quotes (e.g., ''hello''),
and triple-double quotes (e.g., '' '' ''hello'' '' ''; a string inside triple quotes can span
multiple lines). In our example program, we use the assignment expression,
current = ' ' (two single quotes), to initialize the value of current to the empty
string. The input, s, is a string object.
The addition operator can be used to concatenate two strings. In tokenize, we
use current = current + c to update the value of current to include a new charac-
ter. Since strings are immutable there is no string method analogous to the list
append method. Instead, appending a character to a string involves creating a
new string object.
Dictionaries. A dictionary is a lookup-table where values are associated with
keys. The keys can be any immutable type (strings and numbers are commonly
used as keys); the values can be of any type. We did not use the dictionary type
in tokenize, but it is very useful for implementing frames in the evaluator.
A dictionary is denoted using curly brackets. The empty dictionary is {}. We
add a key-value pair to the dictionary using an assignment where the left side
is a subscript expression that specifies the key and the right side is the value
assigned to that key. For example,
birthyear = {}
birthyear['Euclid'] = '300BC'
birthyear['Ada'] = 1815
birthyear['Alan Turing'] = 1912
birthyear['Alan Kay'] = 1940
defines birthyear as a dictionary containing four entries. The keys are all strings;
the values are numbers, except for Euclids entry which is a string.
We can obtain the value associated with a key in the dictionary using a sub-
script expression. For example, birthyear['Alan Turing'] evaluates to 1912. We can
replace the value associated with a key using the same syntax as adding a key-
value pair to the dictionary. The statement,
birthyear['Euclid'] = 300
The dictionary type lookup and update operations have approximately constant
running time: the time it takes to lookup the value associated with a key does not
Chapter 11. Interpreters 219
scale as the size of the dictionary increases. This is done by computing a number
based on the key that determines where the associated value would be stored (if
that key is in the dictionary). The number is used to index into a structure similar
to a Python list (so it has constant time to retrieve any element). Mapping keys
to appropriate numbers to avoid many keys mapping to the same location in
the list is a difficult problem, but one the Python dictionary object does well for
most sets of keys.
PrimaryExpression :: CallExpression
CallExpression :: PrimaryExpression ( ArgumentList )
ArgumentList :: SomeArguments
ArgumentList :: e
SomeArguments :: Expression
SomeArguments :: Expression , SomeArguments
In Python, nearly every data value (including lists and strings) is an object. This
means the way we manipulate data is to invoke methods on objects. To invoke a
method we use the same rules, but the PrimaryExpression of the CallExpression
specifies an object and method:
PrimaryExpression :: AttributeReference
AttributeReference :: PrimaryExpression . Name
The name AttributeReference is used since the same syntax is used for accessing
the internal state of objects as well.
The tokenize procedure includes five method applications, four of which are
tokens.append(current). The object reference is tokens, the list of tokens in the
input. The list append method takes one parameter and adds that value to the
end of the list.
The other method invocation is c.isspace() where c is a string consisting of one
character in the input. The isspace method for the string datatype returns true
if the input string is non-empty and all characters in the string are whitespace
(either spaces, tabs, or newlines).
The tokenize procedure also uses the built-in function len which takes as in-
put an object of a collection datatype such as a list or a string, and outputs the
number of elements in the collection. It is a procedure, not a method; the input
object is passed in as a parameter. In tokenize, we use len(current) to find the
number of characters in the current token.
Statement :: IfStatement
IfStatement :: if ExpressionPredicate : Block Elifs OptElse
Elifs :: e
Elifs :: elif ExpressionPredicate : Block Elifs
OptElse :: e
OptElse :: else : Block
Unlike in Scheme, there is no need to have an alternate clause since the Python
if statement does not need to produce a value. The evaluation rule is similar to
Schemes conditional expression:
The first if predicate tests if the current character is a space. If so, the end of the
current token has been reached. The consequent Block is itself an IfStatement:
if len(current) > 0:
tokens.append(current)
current = ' '
If the current token has at least one character, it is appended to the list of tokens
in the input string and the current token is reset to the empty string. This IfS-
tatement has no elif or else clauses, so if the predicate is false, there is nothing
to do.
For statement. A for statement provides a way of iterating through a set of
values, carrying out a body block for each value.
Statement :: ForStatement
ForStatement :: for Target in Expression : Block
Other than the first two initializations, and the final two statements, the bulk
of the tokenize procedure is contained in a for statement. The for statement in
tokenize header is for c in s: .... The string s is the input string, a collection of
Chapter 11. Interpreters 221
characters. So, the loop will repeat once for each character in s, and the value of
c is each character in the input string (represented as a singleton string), in turn.
Return statement. In Scheme, the body of a procedure is an expression and the
value of that expression is the result of evaluating an application of the proce-
dure. In Python, the body of a procedure is a block of one or more statements.
Statements have no value, so there is no obvious way to decide what the result
of a procedure application should be. Pythons solution is to use a return state-
ment.
The grammar for the return statement is:
Statement :: ReturnStatement
ReturnStatement :: return Expression
11.2 Parser
The parser takes as input a Charme program string, and produces as output a
nested list that encodes the structure of the input program. The first step is to
break the input string into tokens; this is done by the tokenize procedure defined
in the previous section.
The next step is to take the list of tokens and produce a data structure that en-
codes the structure of the input program. Since the Charme language is built
from simple parenthesized expressions, we can represent the parsed program
as a list. But, unlike the list returned by tokenize which is a flat list containing
the tokens in order, the list returned by parse is a structured list that may have
lists (and lists of lists, etc.) as elements.
Charmes syntax is very simple, so the parser can be implemented by just break-
ing an expression into its components using the parentheses and whitespace.
The parser needs to balance the open and close parentheses that enclose ex-
pressions. For example, if the input string is
(define square (lambda (x) ( x x)))
the output of tokenizer is the list:
['(', 'define', 'square', '(', 'lambda', '(', 'x', ')', '(', '*', 'x', 'x', ')', ')', ')']
The parser structures the tokens according to the program structure, producing
a parse tree that encodes the structure of the input program. The parenthesis
provide the program structure, so are removed from the parse tree. For the ex-
ample, the resulting parse tree is:
['define',
'square',
[ 'lambda',
['x'],
['*', 'x', 'x'] ] ]
222 11.2. Parser
if inner:
error ('Unmatched open paren: ' + s)
return None
else:
return res
The input to parse is a string in the target language. The output is a list of the
parenthesized expressions in the input. Here are some examples:
parse('150') ['150']
parse('(+ 1 2)') [['+', '1', '2']]
parse('(+ 1 (* 2 3))') [['+', '1', ['*', '2', '3']]]
parse('(define square (lambda (x) (* x x)))')
[['define', 'square', ['lambda', ['x'], ['*', 'x', 'x']]]]
parse('(+ 1 2) (+ 3 4)') [['+', '1', '2'], ['+', '3', '4']]
The parentheses are no longer included as tokens in the result, but their pres-
ence in the input string determines the structure of the result.
recursive descent The parse procedure implements a recursive descent parser. The main parse
procedure defines the parse tokens helper procedure and returns the result of
calling it with inputs that are the result of tokenizing the input string and the
Boolean literal False: return parse tokens(tokenize(s), False).
The parse tokens procedure takes two inputs: tokens, a list of tokens (that results
from the tokenize procedure); and inner, a Boolean that indicates whether the
parser is inside a parenthesized expression. The value of inner is False for the
initial call since the parser starts outside a parenthesized expression. All of the
recursive calls result from encountering a '(', so the value passed as inner is True
for all the recursive calls.
The body of the parse tokens procedure initializes res to an empty list that is
used to store the result. Then, the while statement iterates as long as the token
list contains at least one element.
Chapter 11. Interpreters 223
The first statement of the while statement block assigns tokens.pop(0) to current.
The pop method of the list takes a parameter that selects an element from the
list. The selected element is returned as the result. The pop method also mutates
the list object by removing the selected element. So, tokens.pop(0) returns the
first element of the tokens list and removes that element from the list. This is
essential to the parser making progress: every time the tokens.pop(0) expression
is evaluated the number of elements in the token list is reduced by one.
If the current token is an open parenthesis, parse tokens is called recursively to
parse the inner expression (that is, all the tokens until the matching close paren-
thesis). The result is a list of tokens, which is appended to the result. If the
current token is a close parenthesis, the behavior depends on whether or not
the parser is parsing an inner expression. If it is inside an expression (that is,
an open parenthesis has been encountered with no matching close parenthesis
yet), the close parenthesis closes the inner expression, and the result is returned.
If it is not in an inner expression, the close parenthesis has no matching open
parenthesis so a parse error is reported.
The else clause deals with all other tokens by appending them to the list.
The final if statement checks that the parser is not in an inner context when the
input is finished. This would mean there was an open parenthesis without a
corresponding close, so an error is reported. Otherwise, the list representing the
parse tree is returned.
11.3 Evaluator
The evaluator takes a list representing the parse tree of a Charme expression or
definition and an environment, and outputs the result of evaluating the expres-
sion in the input environment. The evaluator implements the evaluation rules
for the target language.
The core of the evaluator is the procedure meval:
def meval(expr, env):
if is primitive(expr): return eval primitive(expr)
elif is if(expr): return eval if(expr, env)
elif is definition(expr): eval definition(expr, env)
elif is name(expr): return eval name(expr, env)
elif is lambda(expr): return eval lambda(expr, env)
elif is application(expr): return eval application(expr, env)
else: error ('Unknown expression type: ' + str(expr))
The if statement matches the input expression with one of the expression types
(or the definition) in the Charme language, and returns the result of applying
the corresponding evaluation procedure (if the input is a definition, no value is
returned since definitions do not produce an output value). We next consider
each evaluation rule in turn.
11.3.1 Primitives
Charme supports two kinds of primitives: natural numbers and primitive pro-
cedures. If the expression is a number, it is a string of digits. The is number
procedure evaluates to True if and only if its input is a number:
224 11.3. Evaluator
def is primitive(expr):
return is number(expr) or is primitive procedure(expr)
def is number(expr):
return isinstance(expr, str) and expr.isdigit()
Here, we use the built-in function isinstance to check if expr is of type str. The
and expression in Python evaluates similarly to the Scheme and special form:
the left operand is evaluated first; if it evaluates to a false value, the value of
the and expression is that false value. If it evaluates to a true value, the right
operand is evaluated, and the value of the and expression is the value of its right
operand. This evaluation rule means it is safe to use expr.isdigit() in the right
operand, since it is only evaluated if the left operand evaluated to a true value,
which means expr is a string.
Primitive procedures are defined using Python procedures. We define the pro-
cedure is primitive procedure using callable, a procedure that returns true only
for callable objects such as procedures and methods:
def is primitive procedure(expr):
return callable(expr)
The else clause means that all other primitives (in Charme, this is only primi-
tive procedures and Boolean constants) self-evaluate: the value of evaluating a
primitive is itself.
For the primitive procedures, we need to define Python procedures that imple-
ment the primitive procedure. For example, here is the primitive plus procedure
that is associated with the + primitive procedure:
def primitive plus (operands):
if (len(operands) == 0): return 0
else: return operands[0] + primitive plus (operands[1:])
The input is a list of operands. Since a procedure is applied only after all subex-
pressions are evaluated, there is no need to evaluate the operands: they are al-
ready the evaluated values. For numbers, the values are Python integers, so we
can use the Python + operator to add them. To provide the same behavior as the
Scheme primitive + procedure, we define our Charme primitive + procedure to
Chapter 11. Interpreters 225
evaluate to 0 when there are no operands, and otherwise to recursively add all
of the operand values.
The other primitive procedures are defined similarly:
def primitive times (operands):
if (len(operands) == 0): return 1
else: return operands[0] * primitive times (operands[1:])
11.3.2 If Expressions
Charme provides an if expression special form with a syntax and evaluation rule
identical to the Scheme if expression. The grammar rule for an if expression is:
We can use this to recognize different special forms by passing in different key-
words. We recognize an if expression by the if token at the beginning of the ex-
pression:
def is if(expr):
return is special form(expr, 'if')
addresses the situation where the name is not found and there is no parent en-
vironment (since we have already reached the global environment) by reporting
an evaluation error indicating an undefined name.
class Environment:
def init (self, parent):
self. parent = parent
self. frame = {}
Using the Environment class, the evaluation rules for definitions and name ex-
pressions are straightforward.
def is definition(expr): return is special form(expr, 'define')
def eval definition(expr, env):
name = expr[1]
value = meval(expr[2], env)
env.add variable(name, value)
11.3.4 Procedures
The result of evaluating a lambda expression is a procedure. Hence, to define the
evaluation rule for lambda expressions we need to define a class for representing
user-defined procedures. It needs to record the parameters, procedure body,
and defining environment:
class Procedure:
def init (self, params, body, env):
self. params = params
self. body = body
self. env = env
def getParams(self): return self. params
def getBody(self): return self. body
def getEnvironment(self): return self. env
11.3.5 Application
Evaluation and application are defined recursively. To perform an application,
we need to evaluate all the subexpressions of the application expression, and
then apply the result of evaluating the first subexpression to the values of the
other subexpressions.
def is application(expr): # requires: all special forms checked first
return isinstance(expr, list)
The eval application procedure uses the built-in map procedure, which is sim-
ilar to list-map from Chapter 5. The first parameter to map is a procedure con-
structed using a lambda expression (similar in meaning, but not in syntax, to
Schemes lambda expression); the second parameter is the list of subexpres-
sions.
The mapply procedure implements the application rules. If the procedure is a
primitive, it just does it: it applies the primitive procedure to its operands.
To apply a constructed procedure (represented by a Procedure), follow the state-
ful application rule for applying constructed procedures:
Charme Application Rule 2: Constructed Procedures. To apply a con-
structed procedure:
1. Construct a new environment, whose parent is the environment
of the applied procedure.
2. For each procedure parameter, create a place in the frame of the
new environment with the name of the parameter. Evaluate each
operand expression in the environment or the application and ini-
tialize the value in each place to the value of the corresponding
operand expression.
3. Evaluate the body of the procedure in the newly created environ-
ment. The resulting value is the value of the application.
The mapply procedure implements the application rules for primitive and con-
structed procedures:
def mapply(proc, operands):
if (is primitive procedure(proc)): return proc(operands)
elif isinstance(proc, Procedure):
params = proc.getParams()
newenv = Environment(proc.getEnvironment())
if len(params) != len(operands):
eval error ('Parameter length mismatch: %s given operands %s'
% (str(proc), str(operands)))
for i in range(0, len(params)):
newenv.add variable(params[i], operands[i])
return meval(proc.getBody(), newenv)
else: eval error('Application of nonprocedure: %s' % (proc))
Chapter 11. Interpreters 229
programming languages they can express the same set of computations: all of
the procedures we define that take advantage of lazy evaluation could be de-
fined with eager evaluation (for example, by first defining a lazy interpreter as
we do here).
The rule is identical to the Stateful Application Rule except for the bolded part
of step 2. To implement lazy evaluation we modify the interpreter to implement
the lazy application rule. We start by defining a Python class for representing
thunks and then modify the interpreter to support lazy evaluation.
Making Thunks. A thunk keeps track of an expression whose evaluation is de-
layed until it is needed. Once the evaluation is performed, the resulting value
is saved so the expression does not need to be re-evaluated the next time the
value is needed. Thus, a thunk is in one of two possible states: unevaluated and
evaluated.
The Thunk class implements thunks:
class Thunk:
def init (self, expr, env):
self. expr = expr
Chapter 11. Interpreters 231
A Thunk object keeps track of the expression in the expr instance variable. Since
the value of the expression may be needed when the evaluator is evaluating an
expression in some other environment, it also keeps track of the environment in
which the thunk expression should be evaluated in the env instance variable.
The evaluated instance variable is a Boolean that records whether or not the
thunk expression has been evaluated. Initially this value is False. After the ex-
pression is evaluated, evaluated is True and the value instance variable keeps
track of the resulting value.
The value method uses force eval (defined later) to obtain the evaluated value of
the thunk expression and stores that result in value.
The is thunk procedure returns True only when its parameter is a thunk:
def is thunk(expr): return isinstance(expr, Thunk)
The force eval procedure first uses meval to evaluate the expression normally. If
the result is a thunk, it uses the Thunk.value method to force evaluation of the
thunk expression. That method uses force eval to find the value of the thunk
expression, so any thunks inside the expression will be recursively evaluated.
def force eval(expr, env):
val = meval(expr, env)
if is thunk(val): return val.value()
else: return val
Next, we change the application rule to perform delayed evaluation and change
a few other places in the interpreter to use force eval instead of meval to obtain
the actual values when they are needed.
We change eval application to delay evaluation of the operands by creating Thunk
objects representing each operand:
232 11.4. Lazy Evaluation
Only the first subexpression must be evaluated to obtain the procedure to apply.
Hence, eval application uses force eval to obtain the value of the first subexpres-
Modern methods of sion, but makes Thunk objects for the operand expressions.
production have
given us the To apply a primitive, we need the actual values of its operands, so must force
possibility of ease evaluation of any thunks in the operands. Hence, the definition for mapply
and security for all;
we have chosen,
forces evaluation of the operands to a primitive procedure:
instead, to have
overwork for some def mapply(proc, operands):
and starvation for def dethunk(expr):
others. Hitherto we if is thunk(expr): return expr.value()
have continued to
else: return expr
be as energetic as we
were before there
were machines; in if (is primitive procedure(proc)):
this we have been ops = map (dethunk, operands)
foolish, but there is
no reason to go on
return proc(ops)
being foolish elif isinstance(proc, Procedure):
forever. ... # same as in Charme interpreter
Bertrand Russell, In
Praise of Idleness, 1932
To evaluate an if expression, it is necessary to know the actual value of the pred-
icate expressions. We change the eval if procedure to use force eval when eval-
uating the predicate expression:
def eval if(expr,env):
if force eval(expr[1], env) != False: return meval(expr[2],env)
else: return meval(expr[3],env)
This forces the predicate to evaluate to a value so its actual value can be used
to determine how the rest of the if expression evaluates; the evaluations of the
consequent and alternate expressions are left as mevals since it is not necessary
to force them to be evaluated yet.
The final change to the interpreter is to force evaluation when the result is dis-
played to the user in the evalLoop procedure by replacing the call to meval with
force eval.
With eager evaluation, this would not work since all operands would be evalu-
ated; with lazy evaluation, only the operand that corresponds to the appropriate
consequent or alternate expression is evaluated.
Lazy evaluation also enables programs to deal with seemingly infinite data struc-
tures. This is possible since only those values of the apparently infinite data
structure that are used need to be created.
Suppose we define procedures similar to the Scheme procedures for manipulat-
ing pairs:
(define cons (lambda (a b) (lambda (p) (if p a b))))
(define car (lambda (p) (p true)))
(define cdr (lambda (p) (p false)))
(define null false)
(define null? (lambda (x) (= x false)))
These behave similarly to the corresponding Scheme procedures, except in Lazy-
Charme their operands are evaluated lazily. This means, we can define an infi-
nite list:
(define ints-from (lambda (n) (cons n (ints-from (+ n 1)))))
With eager evaluation, (ints-from 1) would never finish evaluating; it has no
base case for stopping the recursive applications. In LazyCharme, however, the
operands to the cons application in the body of ints-from are not evaluated un-
til they are needed. Hence, (ints-from 1) terminates and produces a seemingly
infinite list, but only the evaluations that are needed are performed:
LazyCharme> (car (ints-from 1))
1
LazyCharme> (car (cdr (cdr (cdr (ints-from 1)))))
4
Some evaluations fail to terminate even with lazy evaluation. For example, as-
sume the standard definition of list-length:
(define list-length
(lambda (lst) (if (null? lst) 0 (+ 1 (list-length (cdr lst))))))
An evaluation of (length (ints-from 1)) never terminates. Every time an appli-
cation of list-length is evaluated, it applies cdr to the input list, which causes
ints-from to evaluate another cons, increasing the length of the list by one. The
actual length of the list is infinite, so the application of list-length does not ter-
minate.
Lists with delayed evaluation can be used in useful programs. Reconsider the
Fibonacci sequence from Chapter 7. Using lazy evaluation, we can define a list
that is the infinitely long Fibonacci sequence:5
(define fibo-gen (lambda (a b) (cons a (fibo-gen b (+ a b)))))
(define fibos (fibo-gen 0 1))
5 This example is based on Abelson and Sussman, Structure and Interpretation of Computer
Programs, Section 3.5.2, which also presents several other examples of interesting programs con-
structed using delayed evaluation.
234 11.4. Lazy Evaluation
Exercise 11.3. Define the sequence of factorials as an infinite list using delayed
evaluation.
Exercise 11.4. Describe the infinite list defined by each of the following defini-
tions. (Check your answers by evaluating the expressions in LazyCharme.)
a. (define p (cons 1 (merge-lists p p +)))
b. (define t (cons 1 (merge-lists t (merge-lists t t +) +)))
c. (define twos (cons 2 twos))
d. (define doubles (merge-lists (ints-from 1) twos ))
Exercise 11.5. [??] A simple procedure known as the Sieve of Eratosthenes for
finding prime numbers was created by Eratosthenes, an ancient Greek math-
ematician and astronomer. The procedure imagines starting with an (infinite)
list of all the integers starting from 2. Then, it repeats the following two steps
forever:
Eratosthenes
1. Circle the first number that is not crossed off; it is prime.
2. Cross off all numbers that are multiples of the circled number.
To carry out the procedure in practice, of course, the initial list of numbers must
be finite, otherwise it would take forever to cross off all the multiples of 2. But,
Chapter 11. Interpreters 235
11.5 Summary
Languages are tools for thinking, as well as means to express executable pro-
grams. A programming language is defined by its grammar and evaluation rules.
To implement a language, we need to implement a parser that carries out the
grammar rules and an evaluator that implements the evaluation rules.
We can produce new languages by changing the evaluation rules of an inter-
preter. Changing the evaluation rules changes what programs mean, and en-
ables new approaches to solving problems.
xkcd
236 11.5. Summary
12
Computability
In this chapter we consider the question of what problems can and cannot be
solved by mechanical computation. This is the question of computability: a computability
problem is computable if it can be solved by some algorithm; a problem that is
noncomputable cannot be solved by any algorithm.
Section 12.1 considers first the analogous question for declarative knowledge:
are there true statements that cannot be proven by any proof? Section 12.2 in-
troduces the Halting Problem, a problem that cannot be solved by any algo-
rithm. Section 12.3 sketches Alan Turings proof that the Halting Problem is
noncomputable. Section 12.4 discusses how to show other problems are non-
computable.
proposition and number theory. A proposition is a statement that is stated precisely enough
to be either true or false. Euclids first proposition is: given any line, an equilat-
eral triangle can be constructed whose edges are the length of that line.
proof A proof of a proposition in an axiomatic system is a sequence of steps that ends
with the proposition. Each step must follow from the axioms using the infer-
ence rules. Most of Euclids proofs are constructive: propositions state that a
thing with a particular property exists, and proofs show steps for constructing
something with the stated property. The steps start from the postulates and fol-
low the inference rules to prove that the constructed thing resulting at the end
satisfies the requirements of the proposition.
consistent A consistent axiomatic system is one that can never derive contradictory state-
ments by starting from the axioms and following the inference rules. If a system
can generate both A and not A for any proposition A, the system is inconsis-
tent. If the system cannot generate any contradictory pairs of statements it is
consistent.
complete A complete axiomatic system can derive all true statements by starting from the
axioms and following the inference rules. This means if a given proposition is
true, some proof for that proposition can be found in the system. Since we do
not have a clear definition of true (if we defined true as something that can be
derived in the system, all axiomatic systems would automatically be complete
by definition), we state this more clearly by saying that the system can decide
any proposition. This means, for any proposition P, a complete axiomatic sys-
tem would be able to derive either P or not P. A system that cannot decide all
statements in the system is incomplete. An ideal axiomatic system would be
complete and consistent: it would derive all true statements and no false state-
ments.
The completeness of a system depends on the set of possible propositions. Eu-
clids system is consistent but not complete for the set of propositions about ge-
ometry. There are statements that concern simple properties in geometry (a fa-
mous example is any angle can be divided into three equal sub-angles) that can-
not be derived in the system; trisecting an angle requires more powerful tools
than the straightedge and compass provided by Euclids postulates.
Figure 12.1 depicts two axiomatic systems. The one on the left one incomplete:
there are some propositions that can be stated in the system that are true for
which no valid proof exists in the system. The one on the right is inconsistent:
it is possible to construct valid proofs of both P and not P starting from the ax-
ioms and following the inference rules. Once a single contradictory proposi-
tion can be proven the system becomes completely useless. The contradictory
propositions amount to a proof that true = false, so once a single pair of con-
tradictory propositions can be proven every other false proposition can also be
proven in the system. Hence, only consistent systems are interesting and we
focus on whether it is possible for them to also be complete.
Russells Paradox. Towards the end of the 19th century, many mathematicians
sought to systematize mathematics by developing a consistent axiomatic sys-
tem that is complete for some area of mathematics. One notable attempt was
Gottlob Freges Grundgestze der Arithmetik (1893) which attempted to develop
an axiomatic system for all of mathematics built from simple logic.
Chapter 12. Computability 239
Provable
Provable
propositions
propositions
Bertrand Russell discovered a problem with Freges system, which is now known
as Russells paradox. Suppose R is defined as the set containing all sets that do Russells paradox
not contain themselves as members. For example, the set of all prime numbers
does not contain itself as a member, so it is a member of R. On the other hand,
the set of all entities that are not prime numbers is a member of R. This set
contains all sets, since a set is not a prime number, so it must contain itself.
The paradoxical question is: is the set R a member of R? There are two possible
answers to consider but neither makes sense:
Yes: R is a member of R
We defined the set R as the set of all sets that do not contain themselves as
member. Hence, R cannot be a member of itself, and the statement that R
is a member of R must be false.
No: R is not a member of R
If R is not a member of R, then R does not contain itself and, by definition,
must be a member of set R. This is a contradiction, so the statement that R
is not a member of R must be false.
The question is a perfectly clear and precise binary question, but neither the
yes nor the no answer makes any sense. Symbolically, we summarize the
paradox: for any set s, s R if and only if s
/ s. Selecting s = R leads to the
contradiction: R R if and only if R
/ R.
Whitehead and Russell attempted to resolve this paradox by constructing their
system to make it impossible to define the set R. Their solution was to introduce
types. Each set has an associated type, and a set cannot contain members of its
own type. The set types are defined recursively:
A type zero set is a set that contains only non-set objects.
A type-n set can only contain sets of type n 1 and below.
This definition avoids the paradox: the definition of R must now define R as a
set of type k set containing all sets of type k 1 and below that do not contain
themselves as members. Since R is a type k set, it cannot contain itself, since it
cannot contain any type k sets.
240 12.1. Mechanizing Reasoning
Principia In 1913, Whitehead and Russell published Principia Mathematica, a bold at-
Mathematica tempt to mechanize mathematical reasoning that stretched to over 2000 pages.
Whitehead and Russell attempted to derive all true mathematical statements
about numbers and sets starting from a set of axioms and formal inference rules.
They employed the type restriction to eliminate the particular paradox caused
by set inclusion, but it does not eliminate all self-referential paradoxes.
For example, consider this paradox named for the Cretan philosopher Epimenides
who was purported to have said All Cretans are liars. If the statement is true,
than Epimenides, a Cretan, is not a liar and the statement that all Cretans are
liars is false. Another version is the self-referential sentence: this statement is
false. If the statement is true, then it is true that the statement is false (a contra-
diction). If the statement is false, then it is a true statement (also a contradic-
tion). It was not clear until Godel, however, if such statements could be stated
in the Principia Mathematica system.
For the proof to be valid, it is necessary to show that statement G can be ex-
pressed in the system.
To express G formally, we need to consider what it means for a statement to not
have any proof in the system. A proof of the statement G is a sequence of steps,
T0 , T1 , T2 , . . ., TN . Each step is the set of all statements that have been proven
Chapter 12. Computability 241
Alan Turing proved that noncomputable problems exist. The way to show that
uncomputable problems exist is to find one, similarly to the way Godel showed
unprovable true statements exist by finding an unprovable true statement.
The problem Turing found is known as the Halting Problem:2
Halting Problem
Input: A string representing a Python program.
Output: If evaluating the input program would ever finish, output True.
Otherwise, output False.
1 The terms decidable and undecidable are sometimes used to mean the same things as com-
one input. Of course, Turing did not define the problem using a Python program since Python had
not yet been invented when Turing proved the Halting Problem was noncomputable in 1936. In fact,
nothing resembling a programmable digital computer would emerge until several years later.
242 12.2. The Halting Problem
Suppose we had a procedure halts that solves the Halting Problem. The input to
halts is a Python program expressed as a string.
For example, halts('(+ 2 3)') should evaluate to True, halts('while True: pass') should
evaluate to False (the Python pass statement does nothing, but is needed to
make the while loop syntactically correct), and
halts(''''''
def fibo(n):
if n == 1 or n == 2: return 1
else: return fibo(n1) + fibo(n2)
fibo(60)
'''''')
should evaluate to True. From the last example, it is clear that halts cannot be
implemented by evaluating the expression and outputting True if it terminates.
The problem is knowing when to give up and output False. As we analyzed in
Chapter 7, evaluating fibo(60) would take trillions of years; in theory, though, it
eventually finishes so halts should output True.
This argument is not sufficient to prove that halts is noncomputable. It just
shows that one particular way of implementing halts would not work. To show
that halts is noncomputable, we need to show that it is impossible to implement
a halts procedure that would produce the correct output for all inputs in a finite
amount of time.
Here is another example that suggests (but does not prove) the impossibility of
halts (where sumOfTwoPrimes is defined as an algorithm that take a number as
input and outputs True if the number is the sum of two prime numbers and False
otherwise):
halts('n = 4; while sumOfTwoPrimes(n): n = n + 2')
This program halts if there exists an even number greater than 2 that is not the
sum of two primes. We assume unbounded integers even though every actual
computer has a limit on the largest number it can represent. Our computing
model, though, uses an infinite tape, so there is no arbitrary limit on number
sizes.
Knowing whether or not the program halts would settle an open problem known
as Goldbachs Conjecture: every even integer greater than 2 can be written as the
sum of two primes. Christian Goldbach proposed a form of the conjecture in a
letter to Leonhard Euler in 1742. Euler refined it and believed it to be true, but
couldnt prove it.
With a halts algorithm, we could settle the conjecture using the expression above:
if the result is False, the conjecture is proven; if the result is True, the conjecture
is disproved. We could use a halts algorithm like this to resolve many other open
problems. This strongly suggests there is no halts algorithm, but does not prove
it cannot exist.
Proving Noncomputability. Proving non-existence is requires more than just
showing a hard problem could be solved if something exists. One way to prove
non-existence of an X, is to show that if an X exists it leads to a contradiction.
Chapter 12. Computability 243
def paradox():
if halts('paradox()'): while True: pass
halts(paradox()) False
If the predicate expression evaluates to False, the alternate block is evalu-
ated. It is empty, so evaluation terminates. Thus, the evaluation of paradox()
terminates, contradicting the result of halts('paradox()').
Either result for halts(`paradox()') leads to a contradiction! The only sensible
thing halts could do for this input is to not produce a value. That means there
is no algorithm that solves the Halting Problem. Any procedure we define to
implement halts must sometimes either produce the wrong result or fail to pro-
duce a result at all (that is, run forever without producing a result). This means
the Halting Problem is noncomputable.
There is one important hole in our proof: we argued that because paradox does
not make sense, something in the definition of paradox must not exist and iden-
tified halts as the component that does not exist. This assumes that everything
else we used to define paradox does exist.
This seems reasonable enoughthey are built-in to Python so they seem to ex-
ist. But, perhaps the reason paradox leads to a contradiction is because True
does not really exist or because it is not possible to implement an if expression
that strictly follows the Python evaluation rules. Although we have been using
these and they seems to always work fine, we have no formal model in which
to argue that evaluating True always terminates or that an if expression means
exactly what the evaluation rules say it does.
Our informal proof is also insufficient to prove the stronger claim that no algo-
rithm exists to solve the halting problem. All we have shown is that no Python
procedure exists that solves halts. Perhaps there is a procedure in some more
powerful programming language in which it is possible to implement a solution
to the Halting Problem. In fact, we will see that no more powerful programming
language exists.
A convincing proof requires a formal model of computing. This is why Alan Tur-
ing developed a model of computation.
244 12.3. Universality
12.3 Universality
Recall the Turing Machine model from Chapter 6: a Turing Machine consists
of an infinite tape divided into discrete square into which symbols from a fixed
alphabet can be written, and a tape head that moves along the tape. On each
step, the tape head can read the symbol in the current square, write a symbol in
the current square, and move left or right one square or halt. The machine can
keep track of a finite number of possible states, and determines which action to
take based on a set of transition rules that specify the output symbol and head
action for a given current state and read symbol.
Turing argued that this simple model corresponds to our intuition about what
can be done using mechanical computation. Recall this was 1936, so the model
for mechanical computation was not what a mechanical computer can do, but
what a human computer can do. Turing argued that his model corresponded
to what a human computer could do by following a systematic procedure: the
infinite tape was as powerful as a two-dimensional sheet of paper or any other
recording medium, the set of symbols must be finite otherwise it would not be
possible to correctly distinguish all symbols, and the number of machine states
must be finite because there is a limited amount a human can keep in mind at
one time.
We can enumerate all possible Turing Machines. One way to see this is to de-
vise a notation for writing down any Turing Machine. A Turing Machine is com-
pletely described by its alphabet, states and transition rules. We could write
down any Turing Machine by numbering each state and listing each transition
rule as a tuple of the current state, alphabet symbol, next state, output symbol,
and tape direction. We can map each state and alphabet symbol to a number,
and use this encoding to write down a unique number for every possible Turing
Machine. Hence, we can enumerate all possible Turing Machines by just enu-
merating the positive integers. Most positive integers do not correspond to valid
Turing Machines, but if we go through all the numbers we will eventually reach
every possible Turing Machine.
This is step towards proving that some problems cannot be solved by any algo-
rithm. The number of Turing Machines is less than the number of real numbers.
Both numbers are infinite, but as explained in Section 1.2.2, Cantors diagonal-
ization proof showed that the real numbers are not countable. Any attempt to
map the real numbers to the integers must fail to include all the real numbers.
This means there are real numbers that cannot be produced by any Turing Ma-
chine: there are fewer Turing Machines than there are real numbers, so there
must be some real numbers that cannot be produced by any Turing Machine.
The next step is to define the machine depicted in Figure 12.2. A Universal Tur-
Universal Turing ing Machine is a machine that takes as input a number that identifies a Turing
Machine Machine and simulates the specified Turing Machine running on initially empty
input tape.
The Universal Turing Machine can simulate any Turing Machine. In his proof,
Turing describes the transition rules for such a machine. It simulates the Turing
Machine encoded by the input number. One can imagine doing this by using
the tape to keep track of the state of the simulated machine. For each step, the
universal machine searches the description of the input machine to find the ap-
Chapter 12. Computability 245
Universal
Result of running TM N
N Turing on an empty input tape
Machine
propriate rule. This is the rule for the current state of the simulated machine
on the current input symbol of the simulated machine. The universal machine
keeps track of the machine and tape state of the simulated machine, and simu-
lates each step. Thus, there is a single Turing Machine that can simulate every
Turing Machine.
Since a Universal Turing Machine can simulate every Turing Machine, and a Tur-
ing Machine can perform any computation according to our intuitive notion of
computation, this means a Universal Turing Machine can perform all computa-
tions. Using the universal machine and a diagonalization argument similar to
the one above for the real numbers, Turing reached a similar contradiction for a
problem analogous to the Halting Problem for Python programs but for Turing
Machines instead.
If we can simulate a Universal Turing Machine in a programming language, that
language is a universal programming language. There is some program that can universal
be written in that language to perform every possible computation. programming
language
To show that a programming language is universal, it is sufficient to show that
it can simulate any Turing Machine, since a Turing Machine can perform ev-
ery possible computation. To simulate a Universal Turing Machine, we need
some way to keep track of the state of the tape (for example, the list datatypes
in Scheme or Python would be adequate), a way to keep track of the internal
machine state (a number can do this), and a way to execute the transition rules
(we could define a procedure that does this using an if expression to make de-
cisions about which transition rule to follow for each step), and a way to keep
going (we can do this in Scheme with recursive applications). Thus, Scheme is a
universal programming language: one can write a Scheme program to simulate
a Universal Turing Machine, and thus, perform any mechanical computation.
The printsThree application would evaluate to True if evaluating the Python pro-
gram specified by p would halt since that means the print(3) statement ap-
pended to p would be evaluated. On the other hand, if evaluating p would not
halt, the added print statement never evaluated. As long as the program speci-
fied by p would never print 3, the application of printsThree should evaluate to
False. Hence, if a printsThree algorithm exists, we would use it to implement an
algorithm that solves the Halting Problem.
The one wrinkle is that the specified input program might print 3 itself. We can
avoid this problem by transforming the input program so it would never print
3 itself, without otherwise altering its behavior. One way to do this would be to
replace all occurrences of print (or any other built-in procedure that prints) in
the string with a new procedure, dontprint that behaves like print but doesnt
actually print out anything. Suppose the replacePrints procedure is defined to
do this. Then, we could use printsThree to define halts:
def halts(p): return printsThree(replacePrints(p) + '; print(3)')
We know that the Halting Problem is noncomputable, so this means the Prints-
Three Problem must also be noncomputable.
The Halting Problem and Prints-Three Problem are noncomputable, but do seem
to be obviously important problems. It is useful to know if a procedure appli-
cation will terminate in a reasonable amount of time, but the Halting Problem
Chapter 12. Computability 247
does not answer that question. It concerns the question of whether the proce-
dure application will terminate in any finite amount of time, no matter how long
it is. This example considers a problem for which it would be very useful to have
a solution for it one existed.
A virus is a program that infects other programs. A virus spreads by copying its
own code into the code of other programs, so when those programs are executed
the virus will execute. In this manner, the virus spreads to infect more and more
programs. A typical virus also includes a malicious payload so when it executes
in addition to infecting other programs it also performs some damaging (cor-
rupting data files) or annoying (popping up messages) behavior. The Is-Virus
Problem is to determine if a procedure specification contains a virus:
Is-Virus
Input: A specification of a Python program.
Output: If the expression contains a virus (a code fragment that will infect
other files) output True. Otherwise, output False.
This works as long as the program specified by p does not exhibit the file-infecting
behavior. If it does, p could infect a file and never terminate, and halts would
produce the wrong output. To solve this we need to do something like we did in
the previous example to hide the printing behavior of the original program.
A rough definition of file-infecting behavior would be to consider any write to
an executable file to be an infection. To avoid any file infections in the spe-
cific program, we replace all procedures that write to files with procedures that
write to shadow copies of these files. For example, we could do this by creating
a new temporary directory and prepend that path to all file names. We call this
(assumed) procedure, sandBox, since it transforms the original program speci-
fication into one that would execute in a protected sandbox.
def halts(p): isVirus(sandBox(p) + '; infectFiles()')
Since we know there is no algorithm that solves the Halting Problem, this proves
that there is no algorithm that solves the Is-Virus problem.
Virus scanners such as Symantecs Norton AntiVirus attempt to solve the Is-
Virus Problem, but its non-computability means they are doomed to always fail.
Virus scanners detect known viruses by scanning files for strings that match sig-
natures in a database of known viruses. As long as the signature database is
frequently updated they may be able to detect currently spreading viruses, but
this approach cannot detect a new virus that will not match the signature of a
previously known virus.
248 12.4. Proving Non-Computability
I am rather puzzled
why you draw this
distinction between
Exercise 12.1. Is the Launches-Missiles Problem described below computable?
proof finders and Provide a convincing argument supporting your answer.
proof checkers. It
seems to me rather Launches-Missiles
unimportant as one Input: A specification of a procedure.
can always get a
proof finder from a Output: If an application of the procedure would lead to the missiles be-
proof checker, and ing launched, outputs True. Otherwise, outputs False.
the converse is
almost true: the
converse false if for
You may assume that the only thing that causes the missiles to be launched is
instance one allows an application of the launchMissiles procedure.
the proof finder to
go through a proof
in the ordinary way, Exercise 12.2. Is the Same-Result Problem described below computable? Pro-
and then, rejecting vide a convincing argument supporting your answer.
the steps, to write
down the final Same-Result
formula as a proof Input: Specifications of two procedures, P and Q.
of itself. One can
easily think up Output: If an application of P terminates and produces the same value
suitable restrictions as applying Q, outputs True. If an application of P does not terminate,
on the idea of proof
which will make
and an application of Q also does not terminate, outputs True. Otherwise,
this converse true outputs False.
and which agree
well with our ideas Exercise 12.3. Is the Check-Proof Problem described below computable? Pro-
of what a proof
should be like. I am
vide a convincing argument supporting your answer.
afraid this may be
more confusing to
you than
Check-Proof
enlightening. Input: A specification of an axiomatic system, a statement (the theorem),
Alan Turing, letter to and a proof (a sequence of steps, each identifying the axiom that is ap-
Max Newman, 1940
plied).
Output: Outputs True if the proof is a valid proof of the theorem in the
system, or False if it is not a valid proof.
Find-Finite-Proof
Input: A specification of an axiomatic system, a statement (the theorem),
and a maximum number of steps (max-steps).
Exercise 12.5. [?] Is the Find-Proof Problem described below computable? Pro-
vide a convincing argument why it is or why it is not computable.
Find-Proof
Input: A specification of an axiomatic system, and a statement (the theo-
rem).
Output: If there is a proof in the axiomatic system of the theorem, outputs
True. Otherwise, outputs False.
Busy-Beaver
Input: A positive integer, n.
Output: A number representing that maximum number of steps a Turing
Machine with n states and a two-symbol tape alphabet can run starting
on an empty tape before halting.
We use 0 and 1 for the two tape symbols, where the blank squares on the tape are
interpreted as 0s (alternately, we could use blank and X as the symbols, but it is
more natural to describe machines where symbols are 0 and 1, so we can think
of the initially blank tape as containing all 0s).
For example, if the Busy Beaver input n is 1, the output should be 1. The best
we can do with only one state is to halt on the first step. If the transition rule
for a 0 input moves left, then it will reach another 0 square and continue forever
without halting; similarly it if moves right.
For n = 2, there are more options to consider. The machine in Figure 12.3 runs
for 6 steps before halting, and there is no two-state machine that runs for more
steps. One way to support this claim would be to try simulating all possible two-
state Turing Machines.
0/1,R
1/1,Halt
A B Halt
0/1,L
1/1,L
Figure 12.3. Two-state Busy Beaver Machine.
We can prove the Busy Beaver Problem is noncomputable by reducing the Halt-
ing Problem to it. Suppose we had an algorithm, bb(n), that takes the number
of states as input and outputs the corresponding Busy Beaver. Then, we could
solve the Halting Problem for a Turing Machine:
TM Halting Problem
Input: A string representing a Turing Machine.
Output: If executing the input Turing Machine starting with a blank tape
would ever finish, output True. Otherwise, output False.
def haltsTM(m):
states = numberOfStates(m)
maxSteps = bb(states)
state = 0
tape = []
for step in range(0, maxSteps):
state, tape = simulateOneStep(m, state, tape)
if halted(state): return True
return False
Exercise 12.6. Confirm that the machine showing in Figure 12.3 runs for 6 steps
before halting.
Chapter 12. Computability 251
Exercise 12.7. Prove the Beaver Bound problem described below is also non-
computable:
Beaver-Bound
Input: A positive integer, n.
Output: A number that is greater than the maximum number of steps a
Turing Machine with n states and a two-symbol tape alphabet can run
starting on an empty tape before halting.
A valid solution to the Beaver-Bound problem can produce any result for n as
long as it is greater than the Busy Beaver value for n.
Exercise 12.8. [? ? ?] Find a 5-state Turing Machine that runs for more than
47,176,870 steps, or prove that no such machine exists.
12.5 Summary
Although todays computers can do amazing things, many of which could not
even have been imagined twenty years ago, there are problems that can never
be solved by computing. The Halting Problem is the most famous example:
it is impossible to define a mechanical procedure that always terminates and
correctly determines if the computation specified by its input would terminate.
Once we know the Halting Problem is noncomputable, we can show that other
problems are also noncomputable by illustrating how a solution to the other
problem could be used to solve the Halting Problem which we know to be im-
possible.
Noncomputable problems frequently arise in practice. For example, identify-
ing viruses, analyzing program paths, and constructing proofs, are all noncom-
putable problems.
Just because a problem is noncomputable does not mean we cannot produce
useful programs that address the problem. These programs provide approxi-
mate solutions, which are often useful in practice. They produce the correct re-
sults on many inputs, but on some inputs must either fail to produce any result
or produce an incorrect result.
Index
O, 130 compose, 55
, 133 composition, 54
, 134 computability, 237
, 69 computable, 237, 241
computer, 3
abacus, 106 computing machines, 106
abstraction, 15, 38, 44, 102 cond, 198
accumulate, 59 conditional expression, 198
accumulators, 84 cons, 77
algorithm, 2, 53 consistent, 238
aliasing, 190, 190 constant time, 139
alphabet, 19 constructors, 200
Analytical Engine, 35, 123 Corner the Queen, 72
any-uple, 81 countable, 9
Apollo Guidance Computer, 12 counter, 181, 196
append, 88 counting, 106
application, 46
apply, 199 data abstraction, 92
arguments, 42 datatype, 75
assignment, 179, 179, 214 debugging, 67
asymptotic operators, 130 defensive programming, 86, 95
authentication, 22 definition, 44, 180, 199
axiomatic system, 237 depth, 6, 162
derivation, 27
Backus-Naur Form, 26 diagonalization, 9
base case, 28, 57 dictionary, 218
begin, 180 digital abstraction, 110
belittle, 19, 22 Digital Equipment Corporation, 208
best-first-sort, 153 discrete, 9
bigger, 76 display, 68
binary numbers, 8 divide-and-conquer, 54
binary question, 4 domain names, 29
binary search, 168 DrRacket, 40
binary tree, 5, 161, 168 dynamic dispatch, 206
binomial expansion, 92 dynamic programming, 129
bit, 4
Bletchley Park, 122 eager evaluation, 230
Boolean, 40, 75, 108 Elements, 237
Boolean logic, 108 else, 198
brute force, 93 encapsulation, 196, 196
busy beaver problem, 249 ENIAC, 207
Enigma, 122
car, 77 Entscheidungsproblem, 122
cdr, 77 environment, 182
class, 200, 226 Epimenides paradox, 240
coffee, 2 eq?, 94, 191
compiler, 38 equal?, 155
complete, 238 evaluation, 40
evaluation stack, 66 List (Python), 216
evaluator, 211, 223 list procedures, 82
exponential, 147 list-append, 88, 190
expression, 40 list-flatten, 90
expt, 70 list-get-element, 85
list-length, 83
factorial, 58, 58, 90 list-map, 86
fcompose, 55 list-product, 84
fibonacci, 127, 131 list-reverse, 89
filtering, 87 list-search, 168
flattening lists, 90 list-sum, 84
format, 94 logarithm, 5
Fortran, 26 logarithmic growth, 160
frame, 182
function, 41 machine code, 38
functional programming, 188 magnetic-core memory, 208
map, 86
games, 71 mcons, 186
global environment, 182 means of abstraction, 21, 37
Goldbachs Conjecture, 242 means of combination, 20
grammar, 26 measuring input size, 136
growth rates, 139 messages, 197
methods, 200
halting problem, 241
Methods (Python), 219
higher-order procedure, 46, 55
MIT, ix, 207, 208
immutable, 186 mlist, 187
imperative programming, 188, 188 mlist-append, 190
inc, 55 modulo, 61
incompleteness, 240 morpheme, 20
indexed search, 169 mutable lists, 187
information, 3 mutable pair, 186
information processes, 2 mutators, 179
inheritance, 196
inherits, 202 name, 44
instance variables, 199 natural language, 19
interpreter, 38, 211 natural languages, 36
intractability, 147 newline, 68
intsto, 89 Nim, 71
invoke, 200 noncomputable, 237, 241
null, 81
khipu, 106 null?, 81
Number, 75
lambda, 45 numbers, 40
Lambda calculus, 122
language, 19 object, 199
lazy evaluation, 229, 229 object-oriented programming, 195
length, 83 Objects (Python), 219
let expression, 156 odometer, 22
linearly, 140 operands, 42
LISP, 39 Organon, 237
list, 81, 8182 override, 204
Pair, 7780 Simula, 208
pair, 77 Sketchpad, 208
parse tree, 27 Smalltalk, 209
parser, 211 sorted binary tree, 161
parsing, 221 sorting, 153167
Pascals Triangle, 91 special form, 48
Pascaline, 106 square, 46
pegboard puzzle, 151, 167 square root, 62
pixel, 11 stack, 24
place, 182 state, 45
postulate, 237 String, 170
power set, 148 string, 19, 217
precedence, 215 subclass, 201
primitive, 20 substitution, 181
primitive expressions, 40 superclass, 201
primitive procedures, 40, 41 surface forms, 19
primitives, 223 Survivor, 71
Principia Mathematica, 240 syllogisms, 237
printf, 68 Symbol, 94
printing, 68 symbol, 198
problem, 53
procedure, 2, 45, 76 tagged list, 94
programmability, 107 tail recursive, 62
programming language, 37 thunk, 230, 230
proof, 238 token, 213
proof by construction, 30 tokenizer, 213
proposition, 238 tracing, 69
Python, 192 transitive, 153
tree, 161
quadratically, 145 truth table, 109
truthiness, 21
random, 157 Turing Machine, 118, 138
recursive definition, 17, 23, 5667, 102 types, 75
recursive descent, 222, 222
recursive grammar, 27 universal computing machine, 120
recursive transition network, 22 universal programming language, 102,
reducible, 246 245
reduction, 246 Universal Turing Machine, 244
relaxation, 178 universality, 102
repeat-until, 193 URL, 173
reverse, 89
rules of evaluation, 50 virus, 246
Russells paradox, 239
web, 173
scalar, 75 web crawler, 177
Scheme, 39 web-get, 173
searching, 167 well-balanced, 163
set, 179 while loop, 192
set-mcar, 186 Whirlwind, 207
set-mcdr, 186 worst case, 138
shag, 22
side effects, 69, 179 xkcd, 235
People
Heron, 62
Hilbert, David, 122, 237
Hopper, Grace, 35, 38, 68
Isaacs, Rufus, 72
Jefferson, Thomas, 22
For your own Unlimited Reading and FREE eBooks today, visit:
http://www.Free-eBooks.net
Share this eBook with anyone and everyone automatically by selecting any of the
options below:
COPYRIGHT INFORMATION
Free-eBooks.net respects the intellectual property of others. When a book's copyright owner submits their work to Free-eBooks.net, they are granting us permission to distribute such material. Unless
otherwise stated in this book, this permission is not passed onto others. As such, redistributing this book without the copyright owner's permission can constitute copyright infringement. If you
believe that your work has been used in a manner that constitutes copyright infringement, please follow our Notice and Procedure for Making Claims of Copyright Infringement as seen in our Terms
of Service here:
http://www.free-ebooks.net/tos.html