Functional Programming PDF
Functional Programming PDF
Philip Wadler
Introduction to
Functional
Programming
Richard Bird
Programming Research Group,
Oxford University
Philip Wadler
Department of Computer Science,
University of Glasgow
PRENTICE HALL
Bibliography: p.
Includes index.
I. Functional programming (Computer science)
I. Wadler, Philip, 1956- II. Title.
QA76.6.B568 1988 005.! 87�36049
ISBN O-13-484189�1
11 12 95
Contents
Preface xi
1 Fundamental Concepts 1
1 . 1 Functional programming 1
1 . 1 . 1 Sessions and scripts 1
1.2 Expressions and values 4
1.2.1 Reduction 4
1 . 3 Types 7
1 .4 Functions and definitions 8
1.4.1 Type information 10
1.4.2 Forms of definition 11
1 .4.3 Currying 12
1 . 5 Specifications and implementations 13
v
vi CONTENTS
2.6.2 Operators 38
2.6.3 Inverse functions 39
2.6.4 Strict and non-strict functions 40
2.7 Type synonyms 43
2.8 Type inference 43
3 Lists 48
3 . 1 List notation 48
3.2 List comprehensions 50
3.3 Operations on lists 53
3.4 Map and filter 61
3.5 The fold operators 65
3.5.1 Laws 68
3.5.2 Fold over non-empty lists 69
3.5.3 Scan 70
3.6 List patterns 72
4 Examples 75
4.1 Converting numbers to words 75
4.2 Variable-length arithmetic 78
4.2.1 Comparison operations 79
4.2.2 Addition and subtraction 80
4.2.3 Multiplication 81
4.2.4 Quotient and remainder 82
4.3 Text processing 86
4.3. 1 Texts as lines 86
4.3.2 Lines as words 89
4.3.3 Lines into paragraphs 90
4.3.4 The basic package 90
4.3.5 Filling paragraphs 91
4.3.6 Summary 93
4.4 Turtle graphics 94
4.5 Printing a calendar 97
4.5.1 Pictures 98
4.5.2 Picturing a calendar 100
4.5.3 Building a calendar 101
6 Efficiency 138
6.1 Asymptotic behaviour 138
6.2 Models of reduction 141
6.2.1 Termination 142
6.2.2 Graph reduction 142
6.2.3 Head normal form 144
6.2.4 Pattern matching 145
6.2.5 Models and implementations 146
6.3 Reduction order and space 147
6.3.1 Controlling reduction order 148
6.3.2 Strictness 150
6.3.3 Fold revisited 150
6.4 Divide and conquer 152
6.4.1 Sorting 152
6.4.2 Multiplication 155
6.4.3 Binary search 157
6.5 Search and enumeration 161
6.5.1 Eight queens 161
6.5.2 Search order 163
6.5.3 Instant insanity 165
9 Trees 233
9.1 Binary trees 233
9. 1 . 1 Measures on trees 234
9 . 1 .2 Map and fold over trees 236
9.1.3 Labelled binary trees 237
9.2 Huffman coding trees 239
9.3 Binary search trees 246
9.3.1 Tree deletion 248
9.4 Balanced trees 253
9.4. 1 Analysis of depth 256
9.5 Arrays 257
9.6 General trees 261
9.6.1 Expression trees 262
9.6.2 Example: pattern matching 264
9.7 Game trees 271
9.7.1 The alpha-beta algorithm 273
CONTENTS ix
Bibliography 288
Index 289
Preface
xi
xii PREFACE
a separate chapter, we have decided to introduce the essential ideas gradually
throughout the text.
Secondly, the subject of recursion is treated rather later on in the book
( in Chapter 5) than an experienced reader might expect. However, there
are two good reasons for introducing recursion later rather than earlier in a
course on functional programming. First of all, we feel that the notion of
a recursive function should be discussed at the same time as the notion of
proof by mathematical induction. They are two sides of the same coin and
one can best be understood only by referring to the other. Second, one can
go a long way in solving problems by using a more-or-less fixed repertoire
of functions, including a number of useful functions that operate on lists.
By emphasising this collection of functions at the outset, we hope to foster a
programming style which routinely deploys these functions as building blocks
in the construction of larger ones .
Thirdly, we say very little about how functional programming languages
are implemented. The major reason for this decision is that there now exist
a number of excellent textbooks devoted primarily to the problem of inter
preting and compiling functional languages. 1 Also, we feel that in the past
too much emphasis has been given to this aspect of functional programming,
and not enough to developing an appropriate style for constructing functional
programs.
The fourth and final aspect of our presentation, and certainly one of the
most important, concerns the decision to use mathematical notation, sym
bols and founts, rather than the concrete syntax of a particular programming
language. It is not our intention in this book to promulgate a particular lan
guage, but only a particular style of programming. However, one does, of
course, have to present some consistent notational framework and the knowl
edgeable reader will quickly recognise the similarity of the one we have chosen
to that suggested by David Turner, of the University of Kent, in a succession
of functional languages. These languages are SASL, KRC and, more recently,
Miranda. 2 The last of these, Miranda, is very close to the kind of notation
we are going to describe. The present book is not an introduction to Mi
randa, for we have found it convenient to differ in a few details (particularly
in the names and precise definitions of the basic list processing functions ) ,
and many features of Miranda are not covered. Nevertheless, the book can
be read with profit by someone who intends to use Miranda or, indeed, many
other functional languages.
We should also acknowledge our debt to a number of other languages
which propose some similar concepts and notations. These are ML ( developed
by Robin Milner at Edinburgh ) , Hope ( Rod Burstall, Dave Macqueen and
Don Sannella at Edinburgh ) , and Orwell ( Philip Wadler at Oxford ) . The
lIncluding the recent Implementation of Functional Programming Languages by S.L.
Peyton Jones, Prentice Hall, Hemel Hempstead, 1987.
2Miranda is a trademark of Research Software Limited.
PREFACE xiii
Detailed organisation
In the first three chapters, we study basic notations for numbers, truth-values,
tuples, functions and lists. Chapter 1 deals with fundamental concepts, re
views the definition of a mathematical function, and introduces sufficient
notation to enable simple functions to be constructed. At the same time, we
briefly introduce the fundamental idea of a specification as a mathematical
description of the task a program is to perform. In Chapter 2 we introduce
notation for basic kinds of data, and also say more about functions. We also
discuss how one can achieve precise control over the layout of printed values.
Chapter 3 introduces lists, the most important data structure in func
tional programming. The names and informal meanings of a number of
functions and operations on lists are presented, and some of the basic al
gebraic laws are described. Simple examples are given to help the student
gain familiarity with these very useful tools for processing lists.
Chapter 4 deals with more substantial examples of list processing and is
organised rather differently from preceding chapters. Each example is accom
panied by exercises, projects, and various suggestions for possible improve
ments. An instructor can easily adapt these examples for use as classroom
projects or student assignments. Some of the examples are not easy and
require a fair amount of study.
In Chapter 5, we finally meet formally the notion of a recursive function
and see the precise definitions of the operations discussed in previous chap
ters. At the same time we introduce the notion of an inductive proof and
show how the algebraic laws and identities described in Chapter 3 can be
proved. If the reader prefers, this chapter can be studied immediately after
Chapter 3, or even in conjunction with it.
The emphasis in the first five chapters is on the expressive power of func
tional notation. The computer stays in the background a..lJ.d its role as a
mechanism for evaluating expressions is touched upon only lightly. In Chap
ter 6 we turn to the subject of efficiency; for this we need to understand a
little more about how a computer performs its task of evaluation. We dis
cuss simple models of evaluation, and relate function definitions to how they
utilise time and space resources when executed by a computer. We also dis
cuss some general techniques of algorithm design which are useful in deriving
efficient solutions to problems.
In Chapter 7 we introduce the notion of an infinite list, and show how such
lists can be used to provide alternative solutions to some old problems, as well
as being a useful framework in which to study new ones. In particular, we
xiv PREFACE
describe how infinite lists can be used in constructing programs that interact
with the user.
In Chapters 8 and 9 we turn to new kinds of data structure and show
how they can be represented in our programming notation. In particular,
Chapter 9 is devoted to the study of trees and their applications. One of
the advantages of an expression-based notation for programming is that the
study of data structures can be presented in a direct and simple manner,
and one can go much further in describing and deriving algorithms that
manipulate general data structures than would be possible in a conventional
programming language.
We have used the material in this text as a basis for courses in functional
programming to first-year Mathematics and Computation undergraduates at
Oxford, to graduate students on an M. Sc. course, and on various industrial
courses. Drafts of the book have also been used to teach undergraduates and
graduates in the USA and The Netherlands. The sixteen lecture course which
is typical at Oxford means that only a selection of topics can be presented in
the time available. We have followed the order of the chapters, but concen
trated on Chapters 2, 3, 4, 7 and part of Chapter 9 . Chapter 5 on recursion
and induction has usually been left to tutorial classes (another typical aspect
of the Oxford system ) . The material in Chapter 5 is not really difficult and
is probably better left to small classes or private study; too many induction
proofs carried out at the blackboard have a distinctly soporific effect. On the
other hand, Chapter 4, on examples, deserves a fair amount of attention. We
have tried to choose applications that interest and stimulate the student and
encourage them to try and find better solutions. Some of the examples have
been set as practical projects with considerable success.
We judge that the whole book could be taught in a two-term ( or two
semester ) course. It can also be adapted for a course on Algorithm Design
( emphasising the material in Chapter 6) , and for a course on Data Structures
( emphasising Chapter 9, in particular) .
It is of course important that formal teaching should be supported by
laboratory and practical work. At Oxford we have used the language Orwell
as a vehicle for practical computing work, but Miranda is a suitable alterna
tive. In fact, any higher-order functional language with non-strict semantics
would do as well, particularly if based on an equational style of definition
with patterns on the left-hand side of definitions.
Acknowledgements
This book has been rather a long time in the making. It has benefited enor
mously from the continued support, enthusiasm and constructive advice of
PREFACE xv
colleagues and students, both at Oxford and other universities. The sugges
tions of colleagues in the Programming Research Group at Oxford have been
particularly relevant, since they have been personally responsible for teaching
the material to their tutorial students while a lecture course was in progress.
A special debt is owed to John Hughes, now at Glasgow, whose grasp of
functional programming was a major influence on this book. We also owe
a particular debt to David Turner who stimulated our interest in functional
programming and provided a simple yet powerful notation, in KRC, for in
spiring programmers to produce mathematical programs.
Several people have read earlier drafts of the book and identified numerous
errors and omissions; in particular, we should like to thank Martin Filby, Si
mon Finn, Jeroen Fokker, Maarten Fokkinga, lain Houston, Antony Simmins,
Gerard Huet, Ursula Martin, Lambert Meertens, Simon Peyton Jones, Mark
Ramaer, Hamilton Richards, Joe Stoy, Bernard Sufrin, and David Turner.
Finally, we should like to acknowledge the contribution of Tony Hoare who
encouraged us to write this book and provided, through his leadership of the
Programming Research Group, such a stimulating environment in which to
work.
RB ( bird@uk.ac.ox.prg )
Chapter 1
Fundamental Concepts
1
2 FUNDAMENTAL CONCEPTS
?
at the beginning of a blank line. We can then type an expression, followed
by a newline character, and the computer will respond by printing the result
of evaluating the expression, followed by a new prompt ? on a new line,
indicating that the process can begin again with another expression.
One kind of expression we might type is a number:
? 42
42
Here, the computer's response is simply to redisplay the number we typed.
The decimal numeral 42 is an expression in its simplest possible form and no
further process of evaluation can be applied to it.
We might type a slightly more interesting kind of expression:
?6 X 7
42
Here, the computer can simplify the expression by performing the multi
plication. In this book, we shall adopt common mathematical notations for
writing expressions. In particular, the multiplication operator will be denoted
by the sign x . It may or may not be the case that a particular keyboard
contains this sign, but we shall not concern ourselves in the text with how to
represent mathematical symbols in a restricted character set.
We will not elaborate here on the possible forms of numerical and other
kinds of expression that can be submitted for evaluation. They will be dealt
with thoroughly in the following chapters. The important point to absorb for
the moment is that one can just type expressions and have them evaluated.
This sequence of interactions between user and computer is called a 'session'.
Now let us illustrate the second, and intellectually more challenging, as
pect of functional programming: building definitions. A list of definitions
will be called a 'script'. Here is a simple example of a script:
square x x Xx
minx y = x, if x ::; y
y, if x> y
In this script, two functions, named square and min, have been defined.
The function square takes a value x as argument and returns the value of
x multiplied by itself as its result. The function min takes two numbers, x
and y, as arguments and returns the smaller value. For the present we will
not discuss the exact syntax used for making definitions. Notice, however,
that definitions are written as equations between certain kinds of expression;
these expressions can contain variables, here denoted by the symbols x and
y.
Having created a script, we can submit it to the computer and enter a
session. For example, the following session is now possible:
1.1 FUNCTIONAL PROGRAMMING 3
? square (3 + 4)
49
? min 3 4
3
? square ( min 3 4)
9
In effect, the purpose of a definition is to introduce a binding associating
a given name with a given value. In the above script, the name square is
associated with the function which squares its argument, and the name min
is associated with the function which returns the smaller of its two arguments.
A set of bindings is called an environment or context. Expressions are always
evaluated within some context and can contain occurrences of the names
found in that context. The evaluator will use the definitions associated with
these names as rules for simplifying expressions.
Some expressions can be evaluated without the programmer having to
provide a context. A number of operations may be given as primitive in that
the rules of simplification are built into the evaluator. For example, we shall
suppose the basic operations of arithmetic are provided as primitive. Other
commonly useful operations may be provided in special libraries. of predefined
functions.
At any stage a programmer can return to the script in order to add or
modify definitions. The new script can then be resubmitted to the computer
to provide a new context and another session started.
For example, suppose we return to the script and add the definitions:
side = 12
area = square side
These equations introduce two numerical constants, side and area. Notice
that the definition of area depends on the previously defined function square.
Having resubmitted the script, we can enter a session and type, for example:
? area
144
? min ( area + 4) 150
148
To summarise the important points made so far:
1. Scripts are collections of definitions supplied by the programmer.
2. Definitions are expressed as equations between certain kinds of expres
sion and describe mathematical functions.
3. During a session, expressions are submitted for evaluation; these ex
pressions can contain references to the functions defined in the script.
4 FUNDAMENTAL CONCEPTS
Exercises
1 . 1 . 1 Using the function square, design a function quad which raises its ar
gument to the fourth power.
1 . 1 .2 Define a function max which returns the greater of its two arguments.
1 . 1 .3 Define a function for computing the area of a circle with given radius
r (use 22/7 as an approximation to 71" ) .
1.2.1 Redu.ction
In this sequence, the label ( + ) refers to a use of the built-in rule for addition,
( x ) refers to a similar rule for multiplication, and (square) refers to a use of
the rule:
square x => x X x
which is associated with the definition of square supplied by the programmer.
The expression 49 cannot be further reduced, so that is the result printed by
the computer.
The above sequence of reduction steps is not the only way to simplify the
expression square (3 + 4). Indeed, another sequence is as follows:
I n this reduction sequence the rule for square is applied first, but the final
result is the same. A fuller account of reduction, including a discussion of
different reduction strate gi es, will be given in Chapter 6. The point to grasp
here is that expressions can be evaluated by a basically simple process of
substitution and simplification, using both primitive rules and rules supplied
by the programmer in the form of definitions.
It is important to be clear about the distinction between values and their
representations by expressions. The simplest equivalent form of an expres
sion, whatever that may be, is not a value but a representation of it. Some
where, in outer space perhaps, one can imagine a universe of abstract values,
but on earth they can only be recognised and manipulated by their rep
resentations. There are many representations for one and the same value.
For example, the abstract number forty-nine can be represented by the dec
imal numeral 49, the roman numeral XLIX, the expression 7 X 7, as well as
infinitely many others. Computers usually operate with the binary repre
sentation of numbers in which forty-nine may be represented by the pattern
0000000000110001 of 16 bits.
We shall say an expression is canonical ( or in normal form) if it cannot
be further reduced. A value is printed as its canonical representation. Notice
that the notion of a canonical expression is dependent both on the syntax
given for forming expressions and the precise definition of the permissible re
duction rules. Some values have no canonical representations, others have no
finite ones. For example, the number 71" has no finite decimal representation.
6 FUNDAMENTAL CONCEPTS
It is possible to get a computer to print out the decimal expansion of 7r digit
by digit, but the process will never terminate.
Some expressions cannot be reduced at all. In other words, they do not
denote well-defined values in the normal mathematical sense. For instance,
supposing the operator / denotes numerical division, the expression 1/0 does
not denote a well-defined number. A request to evaluate 1/0 may cause the
evaluator to respond with an error message, such as 'attempt to divide by
zero', or go into an infinitely long sequence of calculations without producing
any result. In order that we can say that, without exception, every ( well
formed ) expression denotes a value, it is convenient to introduce a special
symbol ..1, pronounced 'bottom', to stand for the undefined value. In partic
ular, the value of 1/0 is ..1 and we can assert 1/0 = ..L. The computer is not
expected to be able to produce the value ..L. Confronted with an expression
whose value is ..1, the computer may give an error message, or it may remain
perpetually silent. Thus, ..1 is a special kind of value, rather like the special
value 00 in mathematical calculus. Like special values in other branches of
mathematics, ..1 can be admitted to the universe of values only if we state
precisely the properties it is required to have and its relationship with other
values. We shall not go into the properties of ..1 for a while, but for now
merely note the reasons for its existence and its special status.
Exercises
three x = 3
In how many ways can the reduction rules be applied to this expression? Do
they all lead to the same final result? Prove that the process of reduction
must terminate for all given expressions. ( Hint: Define an appropriate notion
of expression size, and show that reduction does indeed reduce size. )
Suppose an extra syntactic rule is added to the language: ( iii ) if el and ez
are expressions, then so is ( add el e2). The corresponding reduction rules are:
Count the number of different ways the reduction rules can be applied to
the above expression. Do they always lead to the same final result? Prove
that the the process of reduction must always terminate for any given initial
expression. ( Hint: Extend the notion of expression size. )
1.2.4 Imagine a language of finite sequences of 0 and 1. The rules for sim
plifying strings in this language are given by:
1.3 Types
In the notation we are going to describe, the universe of values is partitioned
into organised collections, called types. Types can be divided into two kinds.
Firstly, there are basic types whose values are given as primitive. For ex
ample, numbers constitute a basic type ( the type num), as do truth-values
( the type bool) and characters ( the type char). Secondly, there are com
pound ( or derived) types, whose values are constructed from those of other
8 FUNDAMENTAL CONCEPTS
types. Examples of derived types include: ( num, char ) , the type of pairs of
values, the first component of which is a number and the second a character;
( num -+ num) , the type of functions from numbers to numbers; and [ char ) ,
the type of lists of characters. Each type has associated with it certain op
erations which are not meaningful for other types. For instance, one cannot
sensibly add a number to a character or multiply two functions together.
It is an important principle of the notation we are going to describe that
every well-formed expression can be assigned a type that can be deduced from
the constituents of the expression alone. In other words, just as the value
of an expression depends only on the values of its component expressions, so
does its type. This principle is called strong-typing.
The major consequence of the discipline imposed by strong-typing is that
any expression which cannot be assigned a 'sensible' type is regarded as not
being well-formed and is rejected by the computer before evaluation. Such
expressions have no value: they are simply regarded as illegal.
Here is an example of a script which contains a definition that cannot be
assigned a sensible type:
ayx 'A'
bee x x + ay x
The expression 'A' in this script denotes the character A. For any x, the
value of ay x is 'A' and so has type char . Since + is reserved to denote the
operation of numerical addition, the right-hand side of the definition of bee
is not well-typed: one cannot add characters numerically. It follows that the
function bee does not possess a sensible type, and the script is rejected by
the computer. (On the other hand, the function ay does possess a sensible
type; we shall see what it is in the next section.)
There are two stages of analysis when an expression is submitted for
evaluation. The expression is first checked to see whether it conforms to
the correct syntax laid down for expressions. If it does not, the computer
signals a syntax error. This stage is called syntax-analysis. If it does, then
the expression is analysed to see if it possesses a sensible type. This stage is
called type-analysis. If the expression fails to pass this stage, the computer
signals a type error. Only if the expression passes both stages can the process
of evaluation begin. Similar remarks apply to definitions before a script is
accepted.
Strong typing is important because adherence to the discipline can help
in the design of clear and well-structured programs. What is more, a wide
range of logical errors can be trapped by any computer which enforces it.
f :: A --> B
This formula asserts that the type of f is A --> B . In other words, the type
expression A --> B denotes a type whenever A and B do, and describes the
type of functions from A to B.
A function f :: A --> B is said to take arguments in A and return results in
B . If x denotes an element of A, then we write f( x ) , or just f x, to denote the
result of applying the function f to x. This value is the unique element of B
associated with x by the rule of correspondence for f. The former notation,
f( x ) , is the one normally employed in mathematics to denote functional
application, but the brackets are not really necessary and we shall use the
second form, f x, instead. However, when formal expressions are mixed in
with running prose we shall often surround them with brackets to aid the
eye. For example, we write (J x ) rather than f x.
We shall be careful never to confuse a function with its application to
an argument. In some mathematics texts one often finds the phrase 'the
function f( x ) ', when what is really meant is 'the function f' . In such texts,
functions are rarely considered as values which may themselves be used as
arguments to other functions and the usage causes no confusion. In functional
programming, however, functions are values with exactly the same status as
all other values; in particular, they can be passed as arguments to functions
and returned as results. Accordingly, we cannot afford to be casual about the
difference between a function and the result of applying it to an argument.
It is important to keep in mind the distinction between a function value
and a particular definition of it. There are many possible definitions for
one and the same function. For instance, we can define the function which
doubles its argument in the following two ways:
double x = x + x
double' x = 2 X x
The two definitions describe different procedures for obtaining the correspon
dence, but double and double' denote the same function and we can assert
double = double' as a mathematical truth. Regarded as procedures for evalu
ation, one definition may be more or less 'efficient' than the other, in the sense
that the evaluator may be able to reduce expressions of the form ( double x )
more or less quickly than expressions of the form ( double' x ) . However, the
notion of efficiency is not one which can be attached to function values them
selves. Indeed, it depends on the given form of the definition and the precise
characteristics of the mechanism that evaluates it.
10 FUNDAMENTAL CONCEPTS
1.4.1 Type information
and so on, to denote type variables. Like other kinds of variable, a type
variable can be instantiated to different types in different circumstances. For
instance, the expression (id 3) is well-formed and has type num because
num can be substituted for 0 in the type of id, yielding a ( num -+ num )
version. Similarly, the expression (id square) is well-formed and has type
( num -+ num) because ( num -+ num) ( the type of the function square)
can be substituted for o. Finally, the expression (id id) is also well-formed
because the type ( 0 -+ 0 ) can itself be substituted for o. The type of (id id)
is therefore ( 0 -+ 0 ) . And, of course, we have id id = id.
Here is another example of a valid definition whose associated type con
tains variables. Recall the definition of the function ay from the previous
section:
ayx = 'A'
The type associated with ay is ay :: 0 -+ char. The source type of ay can
be any type at all.
We now have the beginnings of a language of expressions that denote
types. This language contains constant expressions, such as num or char ,
variables, such as 0 and (3, and operators, such as -+ . If such an expression
does contain variables, then we say that it denotes a polymorphic type. In
particular, the functions id and ay have polymorphic types.
1.4 FUNCTIONS AND DEFINITIONS 11
min x y = x, if x �y
= y, if x > y
Notice that the arguments to min are written without brackets and an inter
vening comma. We can, if we like, add brackets and write:
min' ( x , y) = x, if x � y
= y, otherwise
The two functions, min and min', are very closely related, but there is a
subtle difference: they have different types. The function min' takes a single
argument which is a structured value consisting of a pair of numbers; its type
is given by:
min' :: ( num, num) -+ num
The function min, on the other hand, takes two arguments one at a time. Its
type is given by:
min : : num -+ (num -+ num)
In other words, min is a function which takes a number and returns a function
( from numbers to numbers ) . For each value of x the expression (min x)
denotes a function which takes an argument y and returns the minimum of
x and y.
Here is another example. The function add, defined by:
add x y = x+y
also has type num -+ (num -+ num). For each x , the function ( add x) 'adds
x to things'. In particular, ( add 1) is the successor function which increments
its argument by 1 , and ( add 0) is the identity function on numbers.
This simple device for replacing structured arguments by a sequence of
simple ones is known as 'currying', after the American logician H. B. Curry.
One advantage of currying is that it reduces the number of brackets which
have to be written in expressions ( an aspect of the notation the reader will
quickly grow to appreciate ) . For currying to work properly in a consistent
manner, we require that the operation of functional application associates to
the left. That is, min x y means ( min x ) y and not min (x y) . As an operator,
functional application has a very 'quiet' notation, being represented by just
a space. In formal expressions this quietness improves readability, and with
currying we can exploit quietness to the full.
We have now said enough about functions, types and definitions to enable
simple scripts to be written. Further material on functions will be found at
the end of the next chapter.
1.5 SPECIFICATIONS AND IMPLEMENTATIONS 13
Exercises
1 .4.3 Give a definition of a function sign : : num --+ num which returns 1 if
its argument is positive, - 1 if its argument is negative, and 0 otherwise.
one x = 1
apply f x = fx
compose f g x = f (g x )
for all x 2: O . This just says that the result of increase should be greater
than the square of its argument, whenever the argument is greater than or
equal to zero.
Here is one possible implementation of increase :
increase x = square (x + 1)
14 FUNDAMENTAL CONCEPTS
This is a valid definition in our programming notation. The proof that this
definition of increase satisfies the specification is as follows: assuming x � 0 ,
w e have:
increase x square ( x + 1 ) ( increase)
= (x + 1) X (x + 1) ( square)
x x x+2x x+1 ( algebra)
> xXx ( assumption: x � 0)
= square x ( square)
Here we have invented a definition of increase first , and afterwards verified
that it meets the specification. Clearly, there are many other functions which
will satisfy the specification and, since this is the only requirement , all are
equally good.
One way of specifying a function is to state the rule of correspondence
explicitly. The functional notation we shall describe can be very expressive,
and it is often possible to write down a formal definition within the notation
which will actually serve as the specification. This specification can then
be executed directly. However, it may prove so grossly inefficient that the
possibility of execution will be of theoretical interest only. Having written
an executable specification, the programmer is not necessarily relieved of
the burden ( or pleasure ) of producing an equivalent but acceptably efficient
alternative.
The problem of showing that formal definitions meet their specifications
can be tackled in a number of ways . One approach, illustrated above, is to
design the definition first and afterwards verify that the necessary conditions
are satisfied. Another approach, which can lead to clearer and simpler pro
grams, is to systematically develop ( or synthesise) the definition from the
specification. For example, if we look at the specification of increase we may
argue that since x + 1 > x for all x , we have:
Exercises
This chapter introduces the basic types of value out of which expressions
are constructed. They include numbers, booleans, characters and tuples. We
shall describe how the values of each type are represented and give some of the
primitive operations for manipulating them. Along the way we shall discuss
furth er features of our notation for functional programming, including: (i )
the relationship between functions and operators; (ii ) how to exercise precise
control over the layout of results; and (iii ) how to abbreviate the names of
types.
The data type num consists of whole numbers ( or integers) and fractional
numbers ( also calle d floating-point numbers) . A whole number is a number
whose fractional part is zero. Numeric constants are represented in decimal
notation, as the following examples show:
Although there are infinitely many numbers, computers have finite ca
pacities and can only store a limited range. Even within a finite range there
are infinitely many fractional numbers, so not all numbers can be stored ex
actly. It is wise to be aware that a limitation exists, especially since it can
cause what appears to be a mathematically correct program to fail or return
unexpected results. However, precise details of number representation and
accuracy will vary from implementation to implementation and we shall not
go into details.
We will use the operations of Table 2.1 for processing elements of num.
Each of these is used as a binary infix operator; for example, we write x + y.
The minus sign ( ) can also be used as a unary prefix operator, that is, we
-
16
2 . 1 NUMBERS 17
+ addition
subtraction
X multiplication
/ division
exponentiation
div integer division (see later )
mod integer remainder (see later )
?3-7-2
-6
? 3 X 7 + 4.1
25. 1
? 3 x (7 + 4.1)
33.3
? square 3 X 4
36
? 3�4 X 2
162
It is clear from these examples that more than one operator may appear
in an expression and that different operators have different binding powers .
Moreover, when the same operator occurs twice in succession, as in the case
(3 - 7 - 2 ) , a certain order of association is assumed. We deal with these
matters, precedence and order of association, separately.
2.1.1 P recedence
exponentiation
x / div mod the 'multiplying' �perators
+ - the 'addition' operators
3�4 X 5 means (3 � 4) x 5
3 X 7 + 4.1 means (3 X 7) + 4.1
square 3 X 4 means ( square 3) X 4
for all values x , y and z of the appropriate type. For example, + and X are
associative operators, but � is not . For an associative operator, the choice of
the order of association has no effect on meaning.
The operators div and mod perform integer division and remainder respec
tively. If x is an arbitrary integer and y is a positive integer, then (x div y)
and ( x mod y ) are defined to be the unique integers q and r satisfying the
condition:
x = q X Y + r and 0 ::::; r < y
In this book, we shall only use div and mod under the stated conditions on
x and y . Here are some simple examples:
? 7 div 3
2
? ( - 7) div 3
-3
? 7 mod 3
1
? ( - 7) mod 3
2
So far we have not said what the types of the arithmetic operators are.
Unary negation has type num --+ num , while the binary operators all have
type num --+ num --+ num. Thus we can write:
( +) :: num --+ num --+ num
( x ) : : num --+ num --+ num
and so on. Notice that the operators in the above type declarations are
enclosed in brackets. A bracketed operator is called a section. Enclosing an
operator in brackets converts it to an ordinary prefix function which can be
applied to its arguments like any other function. For example, we have:
(+) x y = x+y
(x) x y = xxy
These equations explain why the type assigned to each binary operator is
that of a curried function which takes its arguments one at a time. Like any
other name, a bracketed operator can be used in expressions and passed as
an argument to a function. To give a brief illustration, if we define:
both ! x = !xx
20 BASIC DATA TYPES
then we have:
both ( + ) 3 = (+) 3 3 = 3+3 = 6
Note also that:
double = both ( + )
where double is the function that doubles a number.
The notational device of enclosing a binary operator in brackets to convert
it into a normal prefix function can be extended: an argument can also be
enclosed along with the operator. If EB denotes an arbitrary binary operator
( not necessarily a numerical one ) , then ( EB x ) and (x EB ) are functions with the
definitions:
( xEB) Y = x EB y
(EBx) y = y EB x
These forms are also called sections. For example, we have:
( X 2) is the 'doubling' function,
( 1 f ) is the 'reciprocal' function,
(/2) is the 'halving' function,
(+1) is the 'successor' function.
There is one exception to the rule for forming sections: ( - x ) is interpreted
as the unary operation of negation applied to x . If we want a function which
subtracts x from things, then we have to define it explicitly:
subtmct x y = y-x
Having defined subtmct as a curried function, we can apply it to only one
argument and obtain the function ( subtmct x ) which subtracts x from its
argument.
Yo = 2
Yl = ( 2 + 2 / 2 )/ 2 = 1.5
Y2 = ( 1 .5 + 2 / 1 .5 )/ 2 = 1.4167
Y3 = ( 1 .4167 + 2 / 1 .4167) = 1.4142157
improve x y = (y + x /y)/2
Thus, until takes a function p : : 0 ---? bool, a function f : : 0 ---? 0 , and a value
x : : 0 as arguments, and returns a value of type o. The function until is an
example of a recursive function. If p x = False , then the value of ( until p f x)
is defined i n terms of another value of until. Recursive definitions will be
studied in detail in Chapter 5.
Putting these functions together, we have:
Since the functions improve and satis are specific to square roots, an alter
native way of writing the above definition is:
In this version, the functions satis and improve are made local to the defini
tion of sqrt. One advantage of local functions is that they can refer to the
arguments of the main function directly. Thus, as local functions, satis and
improve do not have to name x as an explicit argument .
The definition of sqrt is assembled from three component functions and is
an example of a modular style of programming. In this style, definitions are
constructed out of combinations of simpler functions. Such definitions are
easy to understand and easy to modify. To illustrate this, let us formulate a
2.1 NUMBERS 23
Exercises
2 . 1 . 1 The operators X and div have the same binding power and associate
to the left. What, therefore, is the value of the following expressions?
3 div 1 X 3
3 X 7 div 4
6 div 2 X 8 div 4
2 . 1 . 2 Using the definition of mod given in Section 2.1.3, show that for all
positive x , y and z :
(x + y) mod z (x mod z + y mod z ) mod z
x X (y mod z) = (x X y) mod ( x X z)
24 BASIC DATA TYPES
2 . 1 .3 Assuming all the integers x, y and z are in range, prove that:
x div 2 + (x + 1) div 2 = x
(x X y) div y = x
(x div y) div z = x div (y X z )
2.2 Booleans
Life would be fairly dull if num was the only available data type and the only
operations were those described in the previous section. At the very least
we would like to compare numbers and test whether two numerical expres
sions are equal. For this we need the truth-values. There are two canonical
expressions for denoting truth-values, namely True and False. These two
expressions constitute the data type bool of boolean values ( named after the
nineteenth century logician G. Boole ) . A function that returns boolean values
is called a predicate.
Booleans are important because they are the results returned by the com
parison operators , which are given as follows:
2.2 BOOLEANS 25
= equals
:F not equals
< less than
> greater than
::; less than or equals
� greater than or equals
Here are two simple examples of the use of the comparison operators:
?2=3
False
?2< 1+3
1hLe
All six comparison operators have the same level of precedence and, as the
second example suggests, this is lower than that of the arithmetic operators .
The comparison operators are not confined to numbers only, but can take
expressions of arbitrary type as arguments . The only restriction is that the
two arguments must have the same type. If they do not, then the comparison
causes a type violation and is rejected by the evaluator. Each comparison
operator is therefore a polymorphic function with type:
a -+ a -+ bool
For example, we can evaluate:
? False = 1hLe
False
? False < 1hLe
1hLe
Comparisons on boolean values are defined so that False is less than 1hLe.
2.2.1 Equality
double x = x +x
square x x X x
Then we have:
( double = square) = .1
Note the crucial distinction between the equals sign = as used in its
normal mathematical (or denotational) sense and its use as a computable
test for equality. In mathematics, the assertion double = square is a false
statement . In computation, the result of evaluating the test double = square
is .L . Similarly, the assertion .1 = .1 is a true statement of mathematics
(since anything equals itself) , but evaluating .1 = .1 results in the value .L.
This is not to say that the evaluator is an unmathematical machine, just that
its behaviour is described by a different set of mathematical rules, rules that
are chosen to be executable mechanically.
Boolean values may also be combined using the following logical operators :
? 1 < 2 1\ 2 < 3
True
? ...,( 1 < 2)
False
? 3 < 2 1\ (2 < 3 V 1 = 2)
False
2.2.3 Examples
Let us now give some examples involving boolean values. First , suppose we
want function to determine whether a year is a leap year or not . In the
Gregorian calendar, a leap year is a year that is divisible by 4, except that if
it is divisible by 100, then it must also be divisible by 400. We can express
this in a number of equivalent ways. One is to define:
bad x 1, if x > 1
= False, otherwise
28 BASIC DATA TYPES
Exercises
2.2.1 For each of the following expressions, say whether or not it is well
formed. If the expression is well-formed, then give its value; otherwise, say
whether the error is a syntax-error, type-error, or some other kind:
(3 = - - 3) 1\ True
1 1\ 1 = 2
(1 < x 1\ x < 100) V x = True V x = False
False = ( 1 < 3)
2.2.2 Define a function sumsqrs which takes three numbers and returns the
sum of the squares of the larger two.
? 'a'
'a'
? '7'
' 7'
,
?'
, ,
It is important to understand that the character ' 7 ' is quite a different entity
from the decimal number 7: the former is a character and is a member of
the type cha r , while the latter denotes a number and is a member of the
type num . As with the case of decimals, these primitive expressions cannot
be further evaluated and are simply redisplayed by the evaluator. The third
example shows one way of denoting the space character. For the purposes
of this book, it is convenient to introduce special symbols for denoting the
two most important non-visible control characters: space and newline. The
newline character will be denoted by the sign '1" and, whenever it is desirable
for reasons of legibility, the space character will be denoted by the sign ' u ' .
Two primitive functions are provided for processing characters , code and
decode. The function code :: char ---l> num converts a character to the integer
corresponding to its AS CII code number, and decode : : num ---l> char does
the reverse. For example:
2.3 CHARACTERS AND STRlNGS 29
1Tue
In ASCII, upper-case letters have a lower code number than lower-case letters,
so:
? 'A' < 'a'
1Tue
Using this information we can define simple functions on characters. For
instance, here are three functions for determining whether a character is a
digit , a lower-case letter, or an upper-case letter:
isdigit x = '0 ' � x A x � '9 '
isupper x = 'A' � x A x � 'Z'
islower x = 'a' � x A x � 'z '
Next , we can define a function for converting lower-case letters to upper-case:
capitalise .. char -+ char
capitalise x = decode ( offset + code x ) , if islower x
= x, otherwise
where offset = code 'A' -code 'a'
This definition uses the fact that the lower- and upper-case letters have codes
which are in numerical sequence, but does not depend on their actual values.
In particular, we can calculate:
capitalise 'a' = decode ( offset + code 'a' )
= decode « code 'A' code 'a' ) + code 'a' )
-
2.3.2 Layout
Similarly, (show b) for a boolean b returns the string which prints as the
representation of b, so:
The function show can also be applied to characters and strings. We have
for instance th at:
show 'a' = " 'a' "
show "hello" = " "hello" "
If the result of an evaluation i s not a string, then the evaluator automatically
applies the function show . If it is a string, then it is printed literally. Hence
we have:
? "me how"
me how
? show "me how"
"me how"
? show ( "me" , "how") = "(me,how)"
False
? show ( "me" , "how")
( "me" , "how")
where m width x
=
1m = ( n - m) div 2
rm = ( n - m) - 1m
All three functions are partial, returning .1 if the string is too long to fit in
the given width.
Exercises
2 . 3 . 1 Define a function nextlet which takes a letter of the alphabet and
returns the letter coming immediately after it. Assume that letter A follows
Z.
2 .3.2Define a function digitva1 which converts a digit character to its cor
responding numerical value.
2 . 3 . 3 Put the following strings in ascending order: "McMillan" , "Macmil
Ian" , and "MacMillan" .
2 . 3 .4 What are the values of the following expressions?
show (show 42)
show 42 * show 42
show ")"
2.4 Tuples
One way of combining types to form new ones is by pairing them. For
example, the type ( num, char) consists of all pairs of values for which the first
component is ::. number and the second a character. In particular, (3, 'a') and
2.4 TUPLES 33
(17.3, ' + ') are both values of type (num, char) . The type (a, (3) corresponds
to the cartesian product operation of set theory, where the notation a X (3 is
more often seen.
We can evaluate expressions involving pairs of values:
? (4 + 2 , ' a ' )
(6, 'a')
? (3, 4) (4 , 3)
=
False
? ( 3 , 6) < (4, 2)
True
? ( 3 , ( "a" , False » < (3 , ( "a" , True »
True
fst .. (a , ,8) -t 0:
fst ( x , y) = x
Both fst and snd are polymorphic functions; they select the first and second
component of the pair respectively. Neither function works on any other
tuple-type. If we want to have selection functions for other kinds of tuple,
then they have to be defined separately for each case. For example, we can
34 BASIC DATA TYPE S
define:
fst3 ( x , y , z ) = x
snd3 (x , y , z) = y
thd3 (x , y , z ) z
Here is a function which returns the quotient and remainder of one num
ber by another:
compare op ( x , y) (u, v) = op (x x v) (y x u)
then we can define comparison operations:
requals compare ( = )
rless = compare ( < )
rgreater = compare ( »
and so on.
Finally, w e give a simple function for printing fractions. The definition
below uses the function show introduced in the previous section.
showrat (x , y) show u, if v = 1
show u * "/" * show v , otherwise
where (u, v) = norm ( x , y)
Exercises
eond True x y = x
eond False x y = y
These equations can be written in any order since the two patterns True and
False are distinct and cover all possible boolean values. The given definition
of eond is equivalent to:
eond p x y x, if p = True
= y, if p = False
As another example, we can define the logical connectives using patterns .
For instance:
True /I. x x
False /I. x False
True V x True
False V x = x
These equations define the operators (/I.) and (V) using patterns for the first
argument. It is also possible to define versions using patterns for the second
argument, or indeed for both arguments simultaneously.
We can also define functions over the natural numbers ( that is, the non
negative integers) using patterns. A simple example is:
permute 0 1
permute 1 2
permute 2 0
These equations define a function permute which returns a well-defined result
only if its argument is one of the numbers 0, 1 or 2. The definition of permute
is equivalent to the following one:
permute n 1 , if n = 0
2, if n = 1
0 , if n = 2
We can also use patterns containing variables . For example, the following
definition describes a version of the predecessor function:
pred 0 0
pred ( n + 1 ) n
2.6 FUNCTIONS 37
In this definition, the pattern ( n + 1 ) can only be 'matched' by a value if n
matches a natural number . Hence pred is defined for natural numbers only.
Notice that the two patterns 0 and ( n + 1) are exhaustive in that they cover
all natural numbers, and disjoint in that no natural number matches more
than one pattern ( since n itself must match a natural number ) . It therefore
does not matter in which order we write the two equations defining pred.
Here is another example of a function defined by pattern matching:
count 0 = 0
count 1 = 1
count (n + 2) 2
Like pred, the function count is defined for natural numbers only. It returns
the values 0, 1 or 2, depending on whether its argument is 0, 1 or greater
than 1. Since n must match a natural number, the three patterns 0, 1 and
(n + 2) are disjoint and exhaustive, and the equations can be written in any
order.
Pattern matching is one of the cornerstones of an equational style of
definition; more often than not it leads to a cleaner and more readily under
standable definition than a style based on conditional equations. As we shall
see, it also simplifies the process of reasoning formally about functions.
Exercises
2 . 5 . 1 Define versions of the functions (1\) and (V) using patterns for the
second argument. Define versions which use patterns for both arguments .
Draw u p a table showing the values of and an d or for each version.
2 . 5 .2 Is the definition of pred given in the text equivalent to the following
one?
pred n = 0, if n = 0
= n - 1 , if n > 0
(f . g) x = f (g x )
The type of ( - ) is given by:
(f . g) . h = f . (g . h)
for all functions f , g and h. Accordingly, there is no need to put in brackets
when writing sequences of compositions.
One advantage of functional composition is that some definitions can be
written more concisely. For example, rather than defining a function by a
scheme of the form:
2 .6 . 2 O p e rat ors
As we have seen, a binary operator is just like a function, the only difference
being that it is written between its two arguments rather than before them.
We can also section operators to convert them to prefix form and pass them
as arguments to other functions.
We can also define new operators. For example, consider the operations
of rational arithmetic discussed in Section 2.4. Instead of defining:
radd ( x , y) ( u , v ) =
2.6 FUNCTIONS 39
we can also define radd as an operator:
( x , y) Ef1 (u, v ) =
Similarly for the other functions. In this book, we shall denote all operators
by special symbols, such as Ef1 , 0 , *, and so on, or by writing their names
in bold fount . (At a terminal with a restricted set of founts, operators can
be denoted by some special convention, such as prefixing a name with the
character $.) To avoid fuss, we shallsuppose all non-primitive operators have
the same binding power and associate to the right ; they can also be sectioned
to convert them to prefix form (see Section 2. 1 ) .
One good test of whether to define a function as an operator is to see i f the
operation is associative. Since the rational addition operator Ef1 is associative,
it is more pleasant to be able to write:
2 .6 . 3 Inverse functions
I x = I y if and only if x = y
r\l x ) = x
for all x in A. For example, the function:
I num -+ (num , num)
I x = ( sign x , abs x)
is injective and has inverse:
sqrt ( square x ) = x
40 BASIC DATA TYPES
for all x � O. The function square is injective on the non-negative numbers .
The use of inverse functions as a method of specification gives no hint as
to how an executable ( or constructive) definition might be formulated. The
task of a programmer is to synthesise a constructive definition which meets
the specification. In later chapters we shallmeet a number of instances of this
idea. It will not in general be the case that an injective function I :: A -t B
has an inverse 1-1 that satisfies the further equation
for all y in B . This only happens when I has the additional property that
for every y in B there exists some x in A such that I x = y . A function
satisfying this condition is said to be surjective. If I is both injective and
surjective, then I is said to be bijective.
If f : : A -t B is surjective, but not necessarily bijective, then it is still
possible to specify g by the requirement that :
f (g x ) = x
for all x in B . In general, though, there will be more than one function g
which satisfies the equation. For instance, taking f ( x , y ) = x X y, we have
that each of:
gl x (sign x , abs x )
g2 x ( x , l)
g3 x (2 X x , x /2)
is a possible definition of g . In the absence of any further constraints, each
definition is perfectly satisfactory. We shall also meet examples of this kind
of specification later on.
So far we have seen some examples of polymorphic functions, but other kinds
of value can be polymorphic too. In particular, the special value ..L is poly
morphic. In other words , ..L is a value of every type. This means that any
function f may, conceptually at least, be applied to ..L. If f..L = ..L , then f
is said to be a strict function; otherwise it is non-strict. Most of the com
mon functions of arithmetic are strict functions and only return well-defined
results for well-defined arguments.
The idea of a non-strict function may seem strange at first sight , since
it seems to entail getting something for nothing. However, such functions
are perfectly possible and can be very useful. Consider, for instance, the
following definition:
three num -t num
three x 3
Suppose we submit the above definition to the evaluator, enter a session and
type three ( 1 /0) . This is what would happen :
2.6 FUNCTIONS 41
? three (1/0)
3
What has happened is that the evaluator does not require the value of the
argument to three in order to determine the result , so it does not attempt
to evaluate it. The value of 1/0 is ..l , so three ..l '" ..l and the function is
non-strict.
For a number of reasons, a non-strict semantics is preferable to a strict
one. First, it makes reasoning about equality easier. For example, with a
non-strict semantics we can reason that
2 + three x = 5
for all x , by straightforward substitution of the definition of three into the
left-hand side. No qualification about x possessing a well-defined value is
necessary. A non-strict semantics leads to a simpler and more uniform treat
ment of substitution, and hence to a simpler logical basis for reasoning about
the correctness of functional programs.
There are other advantages too: we can define new control structures by
defining new functions. To illustrate, we can define a function cond, with
type ( bool � a � a � a ) , which exactly matches the control structure
provided by conditional definitions:
cond p x y = x , if p
y , otherwise
For example, instead of defining:
recip x = 0, if x = 0
= l/x , otherwise
we can write:
recip x = cond (x = 0) 0 (l/x)
Under a non-strict semantics, the evaluation of (cond True x y) does not
require the value of y so it is not computed; similarly, the evaluation of
( cond False x y) does not require the value of x . The function cond is, of
course, strict in its first argument since this value is needed to determine the
answer. In particular, we have:
Exercises
2 . 6 . 2 Write down a definition of a function with type ( num --+ num) which
returns no well-defined values .
2 . 6 . 3 Consider the function halve = (div2) . Is it possible to specify a func
tion f by the requirement:
f ( halve x ) = x
for all natural numbers x ? Give one function f that satisfies the equation :
halve (J x) = x
string == [char]
which formalises the terminology ( as we shall see in the next chapter, the
type of lists of values of type a is denoted by raJ).
Type synonyms cannot be recursive. Every synonym must be expressible
in terms of existing types, and this would not be possible if synonyms were
allowed to depend on one another.
Type synonyms can also be generic in that they can be parameterised by
type variables. For example:
pairs a
automorph a
are all valid synonym declarations .
(-) f 9 x = f (g x )
f .. tl
9 .. t2
x .. t3
f (g x ) . . t4
(ii ) (Equality rule) If both the types x : : t and x :: t' can be deduced for a
variable x , then t = t';
(iii ) (Function rule) If t � u = t' � u', then t = t' and u = u'.
Using the application rule on f (g x ) :: t4 , we deduce that :
9 x.. t5
f .. t5 � t4
for some new type t5 . Similarly, from 9 x :: t5 we deduce that :
x .. t6
9 .. t6 � t5
for some new type t6 .
Using the equality rule, we can now obtain the following identities:
tl = t5 � t4
t2 = t6 � t5
t3 = t6
2.8 TYPE INFERENCE 45
This completes the type analysis for the present example. The type we can
deduce for ( - ) is:
(fst 2 y ) . . t4
( + ) (fst} x) . . t 4 -+ t3
Further application of the rule gives:
y .. t5
fst 2 .. t5 -+ t4
(fst} x ) . . t6
(+) . . t6 -+ t4 -+ t3
x . . t7
1st} . . t7 -+ t6
t1 = t7
t2 = t5 = (v1 , v2 )
t3 = t 4 = t 6 = v1 = u 1 = num
t7 = (u1 , u2 )
fix I = I (fix f )
To deduce a type for fix , we proceed as before and introduce the types :
I . . tt
I (fix f) . . t2
so that fix : : t1 -+ t2 .
Analysing the expression I (fix f) by the application rule, we obtain:
f .. t3 -+ t2
fix I .. t3
fix .. t4 -+ t3
I .. t4
2.8 TYPE INFERENCE 47
Exercises
2.8.1 Suppose the functions const , subst and fix are defined by the equations:
const x y x
subst f g x f x (g x )
fix f x = f (fix f) x
Deduce their types.
2.8.2 Show that the identity function id is equal to (subst const const) , where
subst and const are as defined in the previous question. The function compose
can also be expressed in terms of these functions . How?
2 . 8 . 3 Define the function apply which applies its first argument to its second.
What is its type? What is the relationship between apply and id?
2 . 8 .4 Suppose the function query is defined by:
query f x g = g f (J x g)
Is there a sensible type which can be assigned to this function? If not, explain
why.
C hapt er 3
List s
The next two chapters are about lists, their operations and their uses. Lists
are as important in functional programming as sets are in many branches of
mathematics. To enumerate a set is to produce a list of its members in some
order, so the two concepts are closely related. Unlike a set , a particular value
may occur more than once in a given list, and the order of the elements is
significant .
There are many useful operations on lists . Lists can be taken apart , or
rearranged and combined with other lists to form new lists; lists of numbers
can be summed and multiplied; and lists of lists can be concatenated together
to form one long list. Most of the present chapter will be concerned with
introducing the more important list operations and giving simple examples
of their use. The next chapter will be devoted to more substantial examples.
We shall also mention some of the algebraic laws which govern the operations.
For the moment , however, formal definitions and proofs will not be given.
5
This will be done in Chapter when we discuss recursive definitions and the
formal basis for conducting reasoning about functions.
48
3.1 LIST NOTATION 49
A finite list is denoted using square brackets and commas. For example,
[1, 2, 3] is a list of three numbers and [ "hallo" , "goodbye" ] is a list of two
strings . The empty list is written as [ ] and a singleton list, containing just
one element a, is written as [a] . In particular, [[ II is a singleton list , containing
the empty list as its only member.
If the elements of a list allhave type a, then the list itself will be assigned
the type [a] ( read as 'list of a ' ) . For example:
[1, 2 , 3] .. [num]
[ 'h' , 'a', 'l' , '1' , ' 0 '] .. [char]
[[1 , 2] , [3]] .. [[ num II
[(+) , ( x )] .. [num --+ num --+ num]
On the other hand, [1 , "fine day" ] is not a well-formed list because its ele
ments have different types.
Strings, introduced in the previous chapter, are just special kinds of lists ,
namely lists of characters . Thus, "hallo" i s just a convenient built-in short
hand for the list ['h', 'a', '1' , '1' , ' 0 '] and has type [char] . Hence:
?.['h' , 'a' , '1' , '1' , '0']
hallo
Every generic operation on lists is therefore also applicable to strings .
The empty list [ ] is empty of all conceivable values, so it is assigned
the polymorphic type [a] . In any particular expression , [ ] may have a
more refined type. For instance, in the expression [[ ] , [1]] the type of [ ]
is [num] since the second element [1] has this type; similarly, in the expres
sion [ "an" , [ ], "list" ] the type of [ ] is [char] since that is the type of the other
elements .
Unlike a set, a list may contain the same value more than once. For
example, [1, 1] is a list of two elements , both of which happen to be 1 , and
is distinct from the list [1] which contains only one element . Two lists are
equal if and only if they contain the same values in the same order. Hence
we have:
? [1 , 1] = [1]
False
? [1 , 2 , 3] = [3, 2, 1]
False
The special form [a . . b] is provided to denote the list of numbers in in
creasing order from a to b inclusive, going up in steps of 1. If a > b, then
[a . . b] = f l . For example:
? [1 . . 5]
[1, 2, 3 , 4, 5]
50 LISTS
A second special form [a, b . . c] is also provided; this denotes the arithmetic
progression a, a + d, a + 2 X d, . . . , and so on, where d = b - a. For example:
? [2 , 4 . . 10]
[2, 4, 6, 8, 10]
? [1 . . 5] = [1, 2 . . 5]
True
? [4, 3 . . 1]
[4, 3 , 2, 1]
Exercises
and so on.
The best way of explaining what list comprehensions do is to give some
examples . A simple starting point is:
? [x X x I x +- [1 . . 10] ; even x]
[4, 16, 36, 64, 100]
This list comprehension reads: "the list of values x X x , where x is drawn
from the list [1 . . 10] and x is even" . Notice that the order of the elements in
the result is determined by the order of the elements in the generator. Notice
3.2 LIST COMPREHENSIONS 51
also that x i s a local o r 'dummy' variable which can be replaced b y any other
variable name, and whose scope is confined to the list comprehension.
We can have more than one generator in a comprehension, in which case
later generators vary more quickly than their predecessors . For example:
? [( a , b) I a +- [1 . . 3] ; b +- [1 . . 2]]
[ (1 , 1 ) , ( 1, 2) , (2, 1), (2, 2) , ( 3 , 1 ) , ( 3 , 2)]
? [ ( a , b) I b +- [1 . . 2] ; a +- [1 . . 3]]
[ (1 , 1 ) , (2, 1 ) , (3 , 1 ) , ( 1 , 2) , (2, 2) , (3 , 2)]
Furthermore, later generators can depend on the variables introduced by
earlier ones:
? [( i , j ) I i +- [1 . . 4] ; j +- [i + 1 . . 4]]
[(1 , 2) , ( 1, 3 ) , ( 1, 4) , (2, 3), (2, 4) , ( 3 , 4)]
We can freely intersperse generators with boolean-valued expressions:
? [ ( i , j) I i +- [1 . . 4] ; even i ; j +- [i + 1 . . 4] j odd j]
[ (2, 3)]
To illustrate one of the other forms for generators , suppose we define
pairs = [(i , j ) I i +- [1 . . 2] ; j +- [1 . . 3]]
Then we have:
? [i + j I (i , j ) +- pairs]
[2, 3, 4, 3 , 4, 5]
Finally, the main expression of a comprehension does not have to make
use of the variable introduced by a generator:
? [3 I j +- [1 . . 4]]
[3, 3 , 3 , 3]
? [x I x +- [1 . . 3] ; y +- [1 , 2]]
[1, 1 , 2, 2, 3, 3]
We can use list comprehensions in definitions. For example:
spaces n = [' u ' I j +- [1 . . n]]
The function spaces returns a list of n space characters.
Here is a function to list the divisors of a positive integer:
divisors n = [d I d +- [1 . . n] ; n mod d = 0]
The function divisors can be used to define the greatest common divisor
of two positive integers :
gcd a b = max [d I d +- divisors aj b mod d = 0]
52 LISTS
Here, the function max takes a non-empty list of numbers and returns the
largest number in the list. We shall see its definition presently. It is left as
an exercise to modify the definition of gcd to allow zero arguments .
U sing divisors w e can determine whether a number i s prime:
A number n > 1 i s prime if the only divisors d i n the range 1 :::; d :::; n are
1 and n itself; hence the above definition. Note that divisors 1 = [1] , and
since the lists [1] and [1 , 1] are not the same, the above definition correctly
determines that 1 is not a prime.
The given definition of prime is not particularly efficient , but it is less bad
than one might suppose. Testing two lists for equality does not necessarily
involve generating all the members of each list . As soon as two elements in
corresponding positions are found not to be equal, the test can return False
without generating any further elements. Nevertheless, in the case of primes
there is a simple optimisation. If a number n has a proper divisor ( i.e. a
divisor d in the range 1 < d < n) , then it must also have one in the range
2 :::; d :::; ,,[ii. The proof of this fact is left as a simple exercise. It means we
can define:
triads n = [(x , y, z) I x f- [1 . . n] ; y f- [1 . . n] ; z f- [1 . . n] ;
x � 2 + Y � 2 = z � 2]
For example:
? triads 5
[ ( 3 , 4, 5 ) , ( 4, 3 , 5)]
In the above definition of triads , we have not taken steps to ensure that
each essentially distinct triad is printed only once. We can remedy this by
defining:
triads n [ ( x , y , z ) I x f- [l . . n] ; y f- [x . . n] ; z f- [y . . n] ;
x � 2 + y � 2 = z � 2]
In the new definition, the value of y is restricted to the range x :::; y :::; n,
and the value of z to the range y :::; z :::; n. Since z must be at least as big
as the larger of x and y, all triads will still be found.
3.3 OPERATIONS ON LISTS 53
Exercises
hold?
3.2.3 Using a list comprehension, define a function for counting the number
of negative numbers in a list.
3.2.5 Write a program to find all quadruples (a, b, c, d) in the range 0 <
a, b, c, d � n such that a2 + b2 = c2 + � .
divisors n = [d I d - [1 . . n] j n mod d = 0]
3 . 2 . 1 0 Show that if n has a divisor in the range 1 < d < n, then it has one
in the range 1 < d � y'n.
Concatenation takes two lists, both of the same type, and produces a third
list , again of the same type. Hence the above type assignment .
Note that * is an associative operation and has identity element [ ] . In
other words, we have:
The nature of the list elements is irrelevant when computing the length of a
list ; hence the type assignment.
A simple relationship between # and * is given by the equation :
#( xs * ys) = # xs + # ys
for all finite lists xs and ys . We also have the law:
# [e I x +- xs] = #xs
The function dropwhile is similar, except that it 'drops' the longest initial seg
ment whose elements satisfy p . The type assigned to takewhile and dropwhile
is:
( a -+ bool) -+ [a] -+ [a]
The first argument is a predicate ( of type a -+ bool) , the second argument is
a list of type [a] , and the result is another list of type [a] .
Reverse. The function reverse reverses the order of elements in a finite list.
For example:
3.3 OPERATIONS ON LISTS 57
last = hd · reverse
init = reverse · tl . reverse
Zip . The function zip takes a pair of lists and returns a list of pairs of
corresponding elements . Its type is given by:
For example:
? zip ([0 . . 4] , "hallo " )
[(0, 'h' ) , ( 1 , 'a' ) , ( 2 , '1'), (3, '1'), (4, ' 0 ' )]
? zip ( [0 . . 1] , "hallo " )
[(0, 'h' ) , ( 1 , ' a ' )]
As the second example shows, if the two lists do not have the same length,
then the length of the zipped list is the shorter of the lengths of the two
arguments.
The function zip has many uses. Here are some representative examples.
1 . Scalar product. The scalar product of two vectors x and y is defined by:
In this definition, sum is a function which sums the elements of a list of num
bers; we shall meet its definition presently. Notice the form of the generator
in this list comprehension: a pair of values is drawn from a list of pairs .
An alternative definition of the function sp is given by:
The function and takes a list of boolean values and returns True if all of the
elements of the list are True , and False otherwise.
We would like to have nondec [ ] = True . The above definition gives:
and it is not immediately clear what the value of this expression is . In fact,
zip ( [ l , J..) = [ ] , so the above expression reduces to and [ ] and thus to the
required value True. This property of zip will be discussed in more detail
when we come to the formal definition of zip in Chapter 5.
position xs x = hd (positions xs x * [- 1] )
xs ! n = hd [y I ( i , y) +- zip ( [O . . # xs - 1] , xs) j i = n]
(xs * ys) - - xs = ys
permutation xs ys = ( xs - - ys = [ ]) 1\ ( ys - - xs = [ ])
Exercises
( drop n xs) ! m = xs ! ( n + m)
The first law says that (mapl) distributes through concatenation. The second
is a generalisation of the first : it says that applying I to each element of a
concatenated list of lists is the same as applying (map I) to each component
list and concatenating the results .
Equalities such as the above are important for reasoning about the prop
erties of functions. We shall see how to prove them in Chapter 5.
Filter. The second function filter takes a predicate p and a list xs and
returns the sublist of xs whose elements satisfy p. Like map it can be defined
by a list comprehension:
Observe in the last example how the phrase "the sums of the squares of
the even numbers in the range 1 to 10" is translated directly into a formal
expression in our programming notation. Observe also that the type of filter
is the same as takewhile and dropwhile, two of the functions we introduced
in a previous section.
Like map , there are a number of useful identities concerning filter . We
have, in particular, that :
filter p ( xs * ys) = filter p xs * filter p ys
filter p . concat = concat . map (filter p)
filter p . filter q filter q . filter p
The first law says that filter distributes through concatenation. The second
law generalises this distributive property to lists of lists. Finally, the third
law says that filters can be applied in any order. The third law is only valid
if p x t- .1 and q x t- .1 for x t- .i .
Translating comprehensions . There i s a close relationship between the
functions map and filter and the notation of list comprehensions. We have
defined map and filter in terms of list comprehensions, but it is also possible
to go in the other direction and translate list comprehensions into combina
tions of map , filter and the function concat ( which was also defined earlier
as a comprehension ) . There are just four basic rules for carrying out the
conversion:
(1) [x l x +- xs] = xs
(2) [j x I x +- xs] = map ! xs
(3) [e I x +- XSj P X j J
• • • [e I x +- filter p XSj • • • ]
(4) [e I x +- XSj Y +- YS j • • • ] concat [[e I y +- YSj • • '] 1 x +- xs]
This set of rules is sufficient to translate any comprehension provided it
begins with a generator. Note that Rule ( 1 ) is a special case of Rule (2),
namely when ! = id . In order to apply Rules (2) and (3) it may be necessary
to introduce subsidiary functions. For example, in order to translate the
comprehension:
[1 I x +- xs]
it is necessary to introduce a function const, defined by:
const k x = k
Now we have:
[1 1 x +- xs] = [const 1 x I x +- xs] ( const . l )
= map ( const 1 ) xs ( Rule 2)
Similarly, in order to translate:
[x I x +- XS j x = min xs]
64 LISTS
we introduce the function:
Exercises
3.4.1 The function filter can be defined in terms of concat and map :
filter p = concat · map box
where box x = . . .
Give the definition of box .
3 .4.2 What is the type of (map map)?
3 .4.3 Using the rules given in Section 3.4, convert the following expressions
into combinations of map, filter and concat:
[x I xs +- XSS; x +- Xs; odd x]
[(x , y) I x +- XS; p x; y +- ys]
In particular, we have:
a
foldl ( EB) a [ ] =
a EB Xl
foldl ( EB ) a [Xl] =
foldl ( EB) a [Xl . Z2 ]
= ( a EB Xl ) EB Z2
= « a EB Xl ) EB Z2 ) EB X3
foldl ( EB ) a [X1 ' � ' X3]
and so on. Here the brackets group to the left , so foldl stands for "fold left" .
The type of foldl is:
foldl : : ({3 -+ 0: -+ (3) -+ {3 -+ [0:] -+ {3
This is almost identical to the type of foldr , except that in foldr the first
argument has type (0: -+ (3 -+ (3). When EB is associative, both 0: and {3 are
instantiated to the same type, and so foldr and foldl have the same type in
such a case.
Clearly, foldr and foldl are closely related functions differing only in the
way the operations are grouped. One example of the use of foldl is given by:
pack zs foldl ( EB) 0 zs
=
where n EB X = 10 X n + X
This codes a sequence of digits as a single number, assuming the most sig
nificant digit comes first . Thus we have:
n-1
pack [x n-l . Xn-2 , , x o] = L: X k 10k
• • •
k=O
68 LISTS
3.5.1 Laws
There are a number of important laws concerning foldr and foldl. The first
three are called duality theorems.
The first duality theorem states that :
whenever tfJ and a form a monoid and xs is a finite list . Thus, foldr and
foldl define the same function over monoids. However, as we shall see in
Chapter 6, it is sometimes more efficient to define a function using foldr and
sometimes more efficient to use foldl. For example, we could have defined
sum and product using foldl instead of foldr, and we shall see that using foldl
is indeed more efficient. On the other hand, we shall also see that concat,
and , and or are better defined using foldr.
The second duality theorem is a generalisation of the first . Suppose tfJ and @
and a are such that for all x , y , and z we have:
x tfJ (y @ z ) = ( x tfJ y) @ z
x tfJ a = a @ x
In other words, EEl and @ associate with each other, and a on the right of EEl
is equivalent to a on the left of @ . Under these conditions we have:
It follows from the second duality theorem that these definitions are equiv
alent to the previous ones that used foldr . Later on, we shall see that these
new definitions are in fact more efficient . The reader should verify that
oneplus, plusone and 0 meet the conditions of the second duality theorem,
as do postfix , prefix , and [ J .
The third duality theorem states that:
X8 = foldr cons [ ] X8
for all lists xs . Now, from the third duality theorem we have:
and so:
xs = reverse (reverse xs)
for any finite list xs, just as we would expect.
There are many other useful identities concerning foldr and foldl . For exam
ple, if EB and a form a monoid, then:
for all lists xs and ys. A similar identity holds for foldl.
We also have, for arbitrary f and a, that :
Say we wish to find the maximum element of a list. We would like to do this
by defining:
max = foldl ( max ) a
where ( max ) is a binary operator that returns the greater of its two ar
guments . But what should we choose as the value of a? Since ( max) is
associative, we would have a monoid if a were chosen to be the identity
element for ( max ) ; that is:
x max a = x = a max x
70 LISTS
for all X . If we know that the list contains only non-negative numbers, then
0
choosing a to be gives the desired property, and we can define:
maxnaturals = Joldl ( max) 0
Unfortunately there is no value of a with the desired property in the general
case of arbitrary numbers.
We will solve this problem by introducing two new functions Joldl1 and
Joldr1 which are variants of foldl and Joldr. Informally, Joldl1 and Joldr1 are
defined by:
Joldll ( EB) [Xl , X:/ , . . . , Xnl = ( . . . ((Xl EB X2 ) EB X3) · · ·) EB Xn
Joldr 1 (EB) [Xl , X2 , . . . , xnl = (Xl EB (X2 EB · · · (Xn-l EB Xn ) · · . ) )
In particular, we have:
Joldl1 ( EB ) [ Xl ] = Xl
Joldll ( EB ) [Xl , X2 ] = Xl EB X:/
Joldl1 (EB ) [XI , X:/ , X3 ] = (XI EB X:/ ) EB X3
and so on. The difference between Joldl1 and Joldrl is only apparent for lists
of length 3 or more. We have:
in which the brackets group the other way. Both functions are undefined on
the empty list . The type of Joldll and foldrl is:
foldll , foldrl :: ( a -t a -t a ) -t [a] -t a
The reader should compare this with the types of Joldl and Joldr; here, there
is only one type variable ( a ) instead of two ( a and f3).
Now we can solve our problem by defining:
max = foldl1 ( max )
We could, of course, have used foldrl instead. Further, it is easy to define
foldll in terms of foldl:
Joldll ( EB ) xs = foldl ( EB ) (hd xs) (tl xs)
The definition of Joldrl is left as an exercise for the reader.
3.5.3 S can
In particular:
scan (ED) a [XI , X2 , X3]
= [a, ( a ED Xl ) , « a ED Xl ) ED X2) , « ( a ED Xl ) ED X2) ED X3)]
It follows that the last element of the list ( scan (ED) a xs) is just the value of
(foldl (ED) a xs) . Hence:
foldl (ED) a = last . scan ( ED) a
Notice that each element in a scan can be computed in terms of the preceding
element using just one extra ED operation. More precisely, if X ED y can be
computed in a constant number of steps, and the list xs has length n, then
(scan (ED) a xs) can be computed in a number of steps proportional to n.
For example, scan can be used to compute running sums or running
products:
scan ( + ) 0 [1, 2, 3, 4, 5] = [0, 1 , 3, 6, 10, 15]
scan ( x ) 1 [1, 2, 3, 4, 5] = [1, 1, 2, 6, 24, 120]
The last expression is equal to map fact [0 . . 5] , where:
fact n = product [1 . . n]
is a definition of the factorial function. However, this second definition of
the list of factorial numbers is less efficient since each term is computed
independently. In fact, ( map fact [0 . . nD requires about n2 multiplications.
As a related example, we have:
scan ( f ) 1 [1 . . n] = [ I/ O! , 1 / 1 ! , . . . , l / n!]
where n! is the conventional notation for (fact n).
Exercises
3 . 5 . 1 Consider the function all which takes a predicate p and a list xs and
returns True if all elements of xs satisfy p, and False otherwise. Give a
formal definition of all which uses foldr.
3 . 5 .2 Which, if any, of the following equations are true?
foldl ( - ) X xs = X - sum xs
foldr ( - ) X xs = X - sum xs
? 1 : [)
[1)
? 1 : 2 : [3 , 4J
[1, 2, 3 , 4)
? 'h' : 'e' : '1' : 'l' : '
0
'
: [)
hello
By convention, ( :) associates to the right , so 1 : 2 : [3 , 4] means 1 : (2 : [3 , 4]) .
Every list can be constructed b y inserting its elements one b y one into the
empty list (hence the reason for the name 'cons' , which is an abbreviation
for the word 'construct ') . In fact , we can regard an enumerated list :
Xl : X2 : • • •
: Xn : [ ]
x : xs = [x ) * xs
Here, the pattern [x) is used as an abbreviation for x : [ ) . Note that the
three cases described by the above patterns are exhaustive and disjoint . An
74 LISTS
Exercises
[ ] : xs xs
[ ] : xs [[ ] , xs]
xs : [ ] xs
xs : [ ] = [xs]
x:y [x, y]
( x : xs) * ys x : (xs * ys)
3 . 6 . 3 Using pattern matching with (:) , define a function rev2 that reverses
all lists of length 2, but leaves others unchanged. Ensure that the patterns
are exhaustive and disjoint .
Consider the function insert of Exercise 3.5.4. Another way to define
3 . 6 .4
insert is in terms of a function swap:
Examples
The examples of list processing dealt with i n the present chapter come from
a variety of sources and cover both numeric and symbolic applications. In
particular, we shall build a simple package for doing arithmetic with arbitrary
precision integers, design some useful functions for handling text , and show
how to construct pictures of different kinds . Each application is accompanied
by a number of exercises in which the reader is invited to check relationships,
explore possible extensions and suggest improvements.
We be gin with a simple problem, involving both numeric and symbolic
aspects, in which the operation of list indexing plays a central role.
75
76 EXAMPLES
Notice the dash in the phrases "twenty-seven" and "sixty-nine" , and the
connecting word "and" which appears: (i) after the word "hundred" if the
tens part is non-zero; and (ii) after the word "thousand" if the hundreds part
is zero (but the tens part is not ) .
A good way to tackle such problems is to consider a simpler problem
first . There is, of course, no guarantee that solutions obtained for simpler
problems can be used directly in the problem which inspired them; they may
only serve to familiarise the solver with some of the features and difficulties
involved. Even so, the work is not wasted; familiarity with a problem is one
of our most important tools for solving it . And often we will be able to use
the solution directly or by adapting it .
An obvious place to begin is to suppose that the number n belongs to a
smaller interval, say 0 < n < 100 . In this case n has one or two digits . These
digits are going to be needed, so we start with the definition:
In order to define the function combine2 , we shall need the English names
for the simplest numbers. These can be given as lists of strings:
The definition of combine2 uses these lists by indexing into them at the
appropriate places. The definition is as follows:
Recall that list indexing with the operator ( ! ) begins at 0 , not 1 . The pat
terns on the left are mutually disjoint , so the order of the equations is not
important . However, no value is specified for the pattern (0, 0 ) .
The case 0 < n < 1 0 0 yielded easily enough, so now let us try the range
o < n < 1000 when n can have up to three digits. Taking account of the
structure of our first solution, we begin with:
This step is the crucial one as far as the design of the overall algorithm is
concerned. We split n into digits in two stages: first into a hundreds p art
h and a part t less than a hundred; and then, in the definition of convert2 ,
split t into a tens part and a part less than ten.
Now we are ready to tackle the next and final step in which n lies in the
range 0 < n < 1000000 and so can have up to 6 digits. In a similar spirit to
before, we split n into two numbers m and h, where m is the thousands part
and h is the part less than a thousand. We can therefore write:
The required function convert is just the function convert6 , so we are done.
As well as being a good illustration of the use of (*) and ( ! ) , this exam
ple also demonstrates the advantages of pattern matching over conditional
equations . Each case is expressed dearly and concisely, and it is easier to
check that all cases are covered.
Exercises
O :$ x < b
n- l
X = L X k bk
k=o
The major criterion which influences the choice of base is that we require:
b � 2 :$ maxint
where maxint denotes the maximum integer that can be handled by the built
in operations. This condition ensures that 'bigit-by-bigit ' multiplications can
4.2 VARIABLE-LENGTH ARITHMETIC 79
b = 10000
then it is assumed that all numbers up to 108 lie within the permitted range of
the primitive operations. To illustrate this particular choice of b, the number
123456789 can be represented by the list [1, 2345, 6789] ' and 100020003 can be
represented by [1 , 2 , 3] . The number 0 can be represented by the list [0] . These
representations are not unique since an arbitrary number of leading zeros can
be added to an integer without changing its value. We can 'standardize' a
representation by applying the function strep to remove non-significant zeros.
The definition is:
strep xs = [OJ , if ys = [ ]
ys, otherwise
where ys = dropwhile ( = 0) xs
The definition of strep ensures that 0 will have the standard representation
[0] .
Thanks to our choice of representation, which has the most significant bigit
first , the comparison operations on vint can be based on the primitive lex
icographic ordering on lists. We have, of course, to align the two numbers
before performing the comparison, so we define:
copy x n = [x I j +- [1 . . nlJ
If we now define:
then we have:
veq = vcompare ( = )
vleq = vcompare (:s;)
vless = vcompare ( < )
and so on.
80 EXAMPLES
The functions vadd and vsub for doing variable-length addition and subtrac
tion are easily defined. We align the two numbers, do the operation bigit
by bigit , and then 'normalise' the result . Normalisation involves reducing
the result of each operation to a bigit in the required range. For example,
suppose we want to add the numbers:
[7, 3, 7]
[4, 6, 9]
where we suppose that b = 10. The bigit-by-bigit addition of these numbers
is:
[1 1 , 9, 16]
and the normalised result , namely:
[1 , 2, 0 , 6]
is obtained by reducing each bigit modulo b after adding in the carry from
the previous normalisation step. More precisely, suppose we define:
and then convert the result to standard form. The term [0] ensures that the
process is started with an initial carry of O. The expression above is just:
foldr carry [ 0] [ Xl , X2 , • • • , Xn ]
The addition and subtraction operations can now be defined by the equa
tions:
vadd xs ys = norm ( zipwith ( + ) ( align xs ys))
vsub xs ys = norm ( zipwith ( - ) ( align xs ys))
The interesting point about the function vsub is what happens when the
second argument is greater than the first , so the answer is a negative number.
For example, again supposing b = 10, we have:
4.2 VARIABLE-LENGTH ARITHMETIC 81
This pattern of computation can be expressed with the function foldl1 of the
previous chapter. Hence we can define:
vmul xs ys = fold11 (EB) (psums xs ys)
4.2.4 Quotient and remainder
Finally, we turn to the problem of division. As every schoolchild knows , di
vision is the hardest of the arithmetic operations to get right since it involves
a certain amount of guesswork. The framework of the conventional division
algorithm for finding the quotient and remainder consists of a repeated se
quence of steps. At each step, a single bigit of the quotient is computed and
the remainder is calculated for the next step . The result is a sequence of
pairs:
[( qo , rso ) , (ql , rSl ) , . . . , ( qn, rsn)]
where qo ql . . . qn is the final quotient and rS n the final remainder. A pair
(qk, rsk) is determined, by a function dstep say, from the previous pair, the
given divisor, and the kth digit of the dividend.
We can implement this scheme using the function scan introduced in
Section 3.5.3:
divalg xs ys = scan (dstep ys) (0 , take m xs) (drop m xs)
where m = #ys - 1
The starting value (0 , takem xs) of scan is the quotient and remainder for the
first step of the division algorithm. The process begins with the remaining
bigits (drop m xs) of the dividend. The value returned by divalg is a list of
pairs, the first components of which are the bigits forming the quotient. The
second component of the last element of the result of divalg is the required
remainder. To illustrate:
divalg [1 , 7 , 8, 4] [6 , 2] = [(0 , [1]) , (0, [1 , 7]) , (2, [5, 4]), (8, [4 , 8])]
The quotient bigits are therefore [0 , 0, 2, 8] and the remainder is [4, 8].
To define dstep , we need to distinguish three cases: the dividend xs for
the current step has length less than, equal to, or greater than the length
of the divisor ys. In fact , since the remainder rs from the previous step has
length at most #ys and xs = rs * [x] for some bigit x , we always have
#xs ::; #ys + 1. We therefore define
dstep ys (q , rs) x = astep xs ys , if #xs < #ys
= bstep xs ys , if #xs = #ys
= cstep xs ys , if #xs = #ys + 1
where xs = rs * [x]
4.2 VARIABLE-LENGTH ARITHMETIC 83
ij - 2 :5 q :5 1i
where q = X div y is the true quotient . In other words, if is sufficiently
Y1
large, then the guess Ii overestimates the true quotient q by at most 2.
This guess is used in the definition of cstep . If it turns out to be too
big, then the result is corrected by further subtractions as necessary. The
definition of cstep is
cstep xs ys = (q , rsO) , if vless rsO ys
= (q + 1, rs1 ) , if vless rs1 ys
(q + 2 , rs2) , ot herwise
where rsO = vsub xs (bmul ys q)
=
Now we are in a position to define the function vqrm which returns the
quotient and remainder on dividing one number by another. To ensure that
84 EXAMPLES
The remaining task is to define the function bdiv for dividing a number
by a single bigit. This is an important special case which arises frequently
in practical calculations, so it merits individual attention. Suppose x =
X 1 X 2 X n is the dividend and d the single bigit divisor. The elements of the
• • •
q, = r, div d
bqrm ( x : zs) d
= ( strep qs, ( last rs) mod d)
where qs = map (divd) rs
rs = scan (EEl) x zs
r EEl x . = b x ( r mod d) + x
Exercises
4.2.1 D efine a function absint so that if number z is represented by the list
of bigits zs, then:
z = absint zs
Check that
absint ( [0] * zs) = absint zs
Hence justify the equation:
absint (strep zs) = absint zs
where strep returns the standard representation of a number.
4.2.2 Justify the equation:
vless zs ys = (absint zs < absint ys)
4.2.12 Modify the definition of vqrm so that it works for arbitrary numbers,
negative as well as positive.
4.2.13 Define functions vdiv and vmod which return the quotient and re
mainder on division.
4.2.14 The definition of vqrm given in the text can be tuned in a number of
ways. In particular, it is possible to improve the guess for q ( see Knuth [2])
and a number of length calculations can be avoided by using versions of
addition and subtraction which produce ( n + l ) -bigit results from n-bigit
arguments. Show how to make vqrm more efficient.
Now let us turn to something completely different. In this section we shall in
vestigate the mathematics of an interesting non-numerical application which
deals with the general problem of processing text .
A text can be viewed in many different ways. The 'atomic' view is that
a text is just a list of characters, so we introduce the type synonym:
text == [char]
However, for certain problems it may be more convenient to view a text as
a sequence of words, or perhaps as a sequence of lines, or even as a sequence
of paragraphs. In this section we shall develop functions for converting from
one view of texts to another.
line == [char]
Let the required function be called lines. Its type is given by:
(In fact , suitably rewritten, these equations constitute the formal recursive
definitions of these functions, as we shall see in the next chapter.) For the
moment , each equation can be justified by appeal to the informal definitions
of foldrl and foldr . We can use the equations to derive the following facts
about lines and unlines:
unlines [xs] = u ( unlines. l)
unlines ( [xs] * xss) = xs ffi unlines xss ( unlines.2)
lines [ ] = a ( lines. l )
lines ( [x] * xs) = x ® lines xs ( lines.2)
The first two equations follow from the equations for foldrl , and the second
two from the equations for foldr and the putative form for lines.
Now, to calculate a , we reason as follows:
a = lines [ ] ( lines. 1)
= lines (unlines [[ ]]) ( unlines. l , with u = [ ])
= [[ ]] (spec.)
The last equality uses the specification of lines as the inverse of unlines. So
we have succeeded, fairly quickly, in calculating a .
Next , let us tackle ® . We have:
x ® us = x ® lines ( unlines xss) (spec.)
= lines ( [x] * unlines us) ( lines.2)
word = = [char]
In a similar spirit to before, we can seek a constructive definition of a function
words for breaking a line into words. The type of words is therefore:
words : : line -+ [word]
For example:
? words "Thisuuuisuauline"
["This" , "is" , "a" , "line"]
? words "line"
["line"]
The function unwords defined by:
This time, the inverse function unparas takes a sequence of paragraphs and
converts it to a sequence of lines by inserting a single empty line between
adjacent paragraphs and concatenating the result. It can be defined by:
Just as in the case of words, we can construct the inverse of unparas and
then filter out the empty sequences to obtain the definition of paras .
Let us now summarise the above results by writing out the complete defini
tions of the functions we have described ( and choosing more suitable names
for the operators ) . Our basic text processing package is contained in the
4.3 TEXT PROCESSING 91
following script:
These six functions have a variety of uses. We give just two. The number
of lines, words and paragraphs in a text can be counted by:
countlines = ( #) . lines
countwords = ( # ) . concat . map words · lines
countparas = ( #) . paras · lines
To parse a text here means to break it into lines, paragraphs and words.
enough to accommodate the longest possible word. It can be shown that the
greedy algorithm minimises the number of lines.
We can define the greedy algorithm in the following way:
In this algorithm, the value of (greedy m ws) is the length of the longest initial
segment of ws which will fit on a line of given column width m. One way of
defining greedy is to write:
If this list is truncated by applying the function takewhile ( � m), then the
length of the result will be one greater ( because of the starting value -1) than
the length of the longest initial segment 'Us of ws for which # 'Unwords 'Us � m.
But this is just the definition of (greedy m ws) . The new definition is more
efficient because not allinitial segments of ws need to be examined, and also
because lengthy recalculations of 'Unwords are avoided.
Using fill we can left-justify a text within a specified column width m by
the function filltext defined as follows:
4.3.6 S ummary
We have covered a good deal of ground with this example and it is worthwhile
to identify the main landmarks. First, just as in the treatment of variable
length arithmetic, there is the consistent use of higher-order functions to
express common patterns of computation in a concise manner. Our basic
text processing package is only eight lines long, but it is remarkably powerful.
Second, we see the technique of specifying functions of interest as the
inverses of functions which we know how to compute. These specifications
are not executable but they capture in the clearest possible way just what
we want to achieve.
Third, there is the systematic use of equational reasoning to derive con
structive definitions of some functions from their specifications. In essence,
we have derived these programs by a process of manipulating formulae: a
process very much in the spirit of normal mathematical traditions.
Exercises
4 . 3 . 1 Justify the claim that the operator EB used in the definition of unlines
is associative but does not possess an identity element .
4.3.2 Verify the following equations:
unlines [xs] = xs ( unlines. 1 )
unlines ( [xsl * xss) = xs EB unlines xss ( unlines.2)
lines [ ] a ( lines. 1)
lines ( [x] * xs) = x ® lines xs ( lines.2)
4.3.7 Extend the problem of filling text by developing an algorithm for in
serting sufficient spaces between words so that the text is justified to the
right boundary.
94 EXAMPLES
The next example is fun since it involves drawing pictures. It will also demand
more hard work on the part of the reader, since much of the material is
presented in the way of exercises.
Consider a device ( called a 'turtle ' ) which can move around a potentially
infinite rectangular grid. At any one moment, the turtle is at some grid
point and is oriented in one of four directions: North, East, South or West.
The turtle is equipped with a pen which can be either in the 'up ' or 'down '
position. When the pen is down, each grid point that the turtle passes over
is marked; when the pen is up the turtle moves without leaving a trail.
The commands which can be issued to the turtle are of three kinds:
2. To move one step along the grid in the direction currently indicated.
Given an initial state and a sequence of commands, the turtle will trace
out a certain pattern over the grid. The object of this section is to define
functions for moving the turtle and drawing the resulting trail.
A turtle state consists of a current direction, a pen position and a point on
the grid. A turtle command is a function from states to states. The following
type synonyms can therefore be introduced:
move .. command
move (O, p, (x, y)) = (O, p , (x - l, y))
move (l, p, (x , y)) = (l, p , (x, y + 1))
move (2, p, (x, y)) = (2, p , (x + 1, y))
move (3, p, (x , y)) = (3, p , (x , y - 1))
4.4 TURTLE GRAPHICS 95
In the WQrst case, the test fQr list membership requires n steps, where n is
the length Qf xs.
Since a turtle trail may mark the same PQint many times, Qne sensible
QptimisatiQn is to. remQve duplicates frQm the list Qf PQints. In fact , if we
first SQrt the PQints, then duplicate values will be brQught tQgether and it is
Qnly necessary to. remQve adjacent duplicates. Assuming the existence Qf a
functiQn sort which will SQrt PQints in increasing Qrder Qf x-cQQrdinate ( and
fQr equal x-cQQrdinates, in increasing Qrder Qf y-cQQrdinate) , we have:
sortpoints = remdups · sort
where remdups is a functiQn fQr remQving adjacent duplicates. Its definitiQn
is left as an exercise.
The abQve methQd fQr cQmputing bitmap is inefficient . A faster metho.d can
be based Qn the assumptiQn that the PQints are sQrted in increasing Qrder,
first Qn x-cQQrdinate and then Qn y-cQQrdinate. First , divide the list Qf pQints
ps into. sublists, Qne fQr each value x in xran. The sublist cQrresPQnding to.
x is the ( pQssibly empty ) list Qf Qriginal PQints whQse first cQQrdinate is x.
Each sublist will be sQrted in increasing Qrder Qf y-cQQrdinate. Next , divide
each Qf these sublists into. sub-sublists, Qne fQr each value Qf y in yran. The
sub-sublist corresPQnding to. Y will be thQse PQints Qf the sublist whQse secQnd
cQQrdinate is y. The sub-sublist (fQr y) , Qf the sublist (fQr x ) , will either be
empty, in the case ( x , y) is nQt a marked PQint , Qr cQnsist Qf just a single
element ( x , y) . In this way it is PQssible to. determine the CQrrect "bit" value
Qf ( x , y ) . The cQrresPQnding definitiQn is:
lastbitmap ps = [[ps2 f:. [ ] I ps2 - splitwith snd yran ps1 ]
I ps1 - splitwith 1st xran ps]
where xran = range ( map 1st ps)
yran = range ( map snd ps)
The functiQn splitwith is given by the equatiQns:
splitwith I xs ys = split ( map ( equals J ) xs) ys
equals I X y = (f y = x)
split [a � bool] � [a] � [[all
split [ ] xs []
split (p : ps) xs [takewhile p xs1 * split ps ( dropwhile p xs)
The functiQn split t akes a sequence of predicates [pI , P2 , . . . , Pn l and a list xs ,
and partitiQns xs into. a sequence Qf lists [XSl , XS2 , , xs n ] , where XSI is the
• • •
IQngest initial segment Qf xs all of whQse elements satisfy pI , and XS2 is the
IQngest initial segment Qf the remaining list , all Qf whQse elements satisfy P2 ,
and so. Qn. The functiQn splitwith is an Qptimised versiQn Qf:
splitwith l xs ys = [[y I y - YS j I y = xl i x - xs]
4.5 PRINTING A CALENDAR 97
under the assumption that xs is strictly increasing, and (map f ys) is a non
decreasing sequence of elements from xs .
Exerc i ses
4.4.1 Define the commands left, for making a left-turn, and up and down
for putting the pen up and down.
4 . 4 . 2 Define a function ( block k) which causes the turtle to trace a solid
square of side k .
4.4.3 Define trail and layout .
4.4.4 Assuming there are n points, representing a continuous trail, estimate
the worst case time complexity of bitmap as a function of n (Le. say whether
the number of steps required is proportional to n, n 2 , n3 , or whatever) . (Hint:
Think of a simple trail for which the algorithm is at its worst and hence put
bounds on the lengths of xran and yran.)
4.4.5 Define a function boolstr which converts each truth-value to a suitable
string of characters, and hence define symbolise.
4.4.6 Bearing in mind that duplicate values in a sorted list are adjacent ,
construct a definition of remdups for removing duplicate elements. (Hint:
Use a list comprehension in conjunction with the standard function zip.) For
what kind of turtle trails would it be sensible to apply remdups before, as
well as after, sorting?
4.4.7 Estimate the increase in efficiency by using fastbitmap rather than
bitmap.
4.4.8 It is possible to derive fastbitmap (using the unoptimised definition of
splitwith) from the original definition of bitmap. The derivation uses only
equational reasoning and a small number of laws about list comprehensions.
As a challenging exercise, find the rules and produce the derivation.
4 . 4 . 9 Define some interesting turtle trails and draw them using the functions
introduced above.
OCTOBER 1988
Sun 2 9 16 23 30
Mon 3 10 17 24 31
Tue 4 11 18 25
Wed 5 12 19 26
Thu 6 13 20 27
Fri 7 14 21 28
Sat 1 8 15 22 29
4.5.1 P i ctures
Let us consider the printing phase first . Essentially, this means we have to
build a picture of the calendar. However, unlike the pictures of turtle trails in
the previous section ( which were generated from lists of points in the plane ) ,
we now have to build pictures out of smaller pictures by means of appropriate
combinators.
A picture can be represented by a list of lists of characters in which each
element list has the same length. The height of a picture is the number of
element lists, and the width is their common length. Thus:
height p = #p
width p = #(hd p)
This defines empty (h, w ) to be a list (of length h > 0) of lists (each of length
w > 0) of space characters.
The function ( block n) defined by:
takes a list of pictures, all of which must have the same height and width,
assembles them into groups of n, turns each group into a spread, and finally
stacks the results above one another. The subsidary function (group n) takes
a list of length m X n and returns m lists of length n. One definition of
(group n) is:
PI P2 P3
P4 Ps P6
P7 Ps P9
PI O Pl l Pl 2
PI P4 P7 P to
P2 Ps Ps Pll
P3 P6 P9 Pl 2
100 EXAMPLES
Thus ( blockT n) flips the result of applying ( block n) about the left-right
downwards diagonal. We shall use these two functions below.
We can turn a picture into a larger one by framing it. For example,
suppose we want to frame a picture p in the top left-hand corner of a larger
picture of height m and width n. This can be done by the function lframe :
lframe (m, n) p = (p beside empty (h, n - w» above empty (m - h, n)
where h = height p
w = width p
It is left as an exercise for the reader to modify the definition of lframe so that
it also works in the case m = height p or n = width p. In a similar fashion we
can centre a picture in a larger one, or place it in the top right-hand corner.
Finally, we can display a picture by the function display where:
display = unlines
This is just a renaming of the function unlines considered in Section 4.3.
Given the above functions , the printing phase of the calendar problem is
straightforward:
This frames the title in a 2 X 25 picture, leaving a blank line between the
heading and the table.
In a similar way, we can define the table:
This places the 7 X 3 picture of the days beside the entries for a month and
converts the result into a 8 X 25 picture. One blank line is left at the bottom
of the picture to separate it from the month below.
In order to define entries we need to assign numbers to the days of the
week. It is convenient to say Sunday is day 0, Monday is day 1, and so on up
to Saturday, which is day 6. Suppose first we can define a table of consecutive
numbers ( reading downwards ) arranged so that the first day of the month
occupies its rightful place. For example, with fd == 6 we get:
-5 2 9 16 23 30
-4 3 10 17 24 31
-3 4 11 18 25 32
-2 5 12 19 26 33
-1 6 13 20 27 34
0 7 14 21 28 35
1 8 15 22 29 36
The definition of ( leap yr) is based on the well-known formula for determining
whether the year is a leap year or not :
This computes the accumulated sums of the month lengths ( using scan),
starting at January 1, reduces them modulo 7 to find the day of the week
for the first days of each month, and finally takes just the initial 12 values to
give the required answer.
This completes our description of the calendar problem.
Exercises
4.5.1 Show that:
lframe (m, n) p = ..L
4 . 5 .3 Define a function which given a year and a month prints out the cal
end ax for the given month.
4 . 5 .4 Define a version of the calendar problem which prints a month with
the days across the top of the table rather than down the left-hand side.
4.5.5 Define a version of the calendar problem which prints a month with
the days of the week beginning with Monday rather than Sunday.
4 . 5 .6 Define a function which takes a date and returns the day of the week
on which the date falls.
4.5.7 Define the function zip4 in terms of zip.
C hapter 5
zO = 1
z (n +1 ) = z X (z n )
This definition uses pattern matching with the natural numbers, introduced
in Section 2.5.
The above definition can be rewritten in a form that does not employ
pattern matching:
z � n = 1, if n = O
= x x x � (n - 1) , if n > O
104
5.1 OVER NATURAL NUMBERS 105
5�3 = 5 X (5 � 2) (�.2)
= 5 X (5 X (5 � 1» (�.2)
= 5 X (5 X (5 X (5 � 0))) (�.2)
= 5 X (5 X (5 X 1))) (�.1)
= 125
The comment "(�.2)" at the end of a line means that the equality on that
line is justified by the second equation defining (�) , and similarly for "(�.1)" .
In general, we will refer to the ith equation of the definition of a function f
by writing (I .i).
Notice that the pattern (n + 1) matches the argument 3 by binding n
to 2. The pattern (n + 1) cannot match 0, because n must be bound to a
natural number. Thus, for every natural number n the term x � n will match
either the first equation or the second equation defining (�) , but never both
equations. If n is not a natural number, as in x � ( -3) or x � 2.5 or x � 1.,
then neither equation matches, s o x � n = J.. ( Here we are referring to the
recursive definition above. Of course, x � y is actually well-defined for any
real number y , but we are ignoring that for now. )
It may seem that there is something magical about defining x � ( n + 1)
in terms of x � n. Is such as step valid? It is easy to convince ourselves that
it is. Clearly, x � 0 has a value, because this is given by equation (�.1). By
equation (�.2), we know that if x � n has a value then so will x � (n + 1). So
since x � 0 has a value, so does x � 1 j and then since x � 1 has a value, so does
x � 2; and so on. So x � n has a value for every natural number n, as required.
In practice, this reasoning may also be applied backwards. ( Indeed, the
word 'recurse' comes from the Latin for 'to go back'. ) Given a value for x
and n the value of x � n is found by finding the value of x � (n - 1), and this
is found by finding the value of x � (n - 2), and so on, until eventually x � 0
must be reached.
This argument , whether you think of it forward or backwards, is an exam
ple of a proof by mathematical induction. In general, to prove by induction
that a proposition P( n) holds for any natural number n one must show two
things:
Case O. That P(O) holds; and
Case ( n + 1). That if pen) holds, then pen + 1) holds also.
This is valid for exactly the same reason that recursive definitions are valid.
We know by the first case that P(O) holds; and so we know by the second
106 RECURSION AND INDUCTION
case that P(l) holds also; and so we know again by the second case that P(2)
holds also; and so on. So P( n) must hold for every natural number n.
As an example, let us prove the well-known law:
This example shows the style we will use for inductive proofs , laying out
each case separately and using a " D " to mark the end.
As a second example of a recursive definition, consider the Fibonacci
numbers. In a mathematics textbook, these might be defined by the following
recurrence relationship:
Fo 0
Fl = 1
H+ 2 Fk + Fk+1
Thus, Fo through F9 are:
0, 1 , 1 , 2, 3, 5, 8, 13, 21 , 34
where each number in the sequence is the sum of the two preceding it .
In our notation we will write Fk as fib k, and the above definition becomes:
fib 0 o
fib 1 1
fib (k + 2) fib k + fib (k + 1)
5.1 OVER NATURAL NUMBERS 107
1+ 1-
¢= .J5 and � = .J5
2 2
So ¢2 = ¢ + 1 and �2 = � + 1 . Then we have:
for all natural numbers k . (Here we are using traditional mathematical no
tation, rather than the programming notation. The two notations may be
mixed freely, as convenient, so long as no confusion arises. )
= Fk + FkH (F.3)
= c(¢k �k ) + c(¢kH �kH )
_ _
(hypothesis )
= c(¢k ( 1 + ¢) �k (1 + �))
_
( arithmetic )
= c( ¢k+ 2 �k+2 )
_
(¢2 = ¢ + 1 , �2 = � + 1)
The induction principle used this time is a little different from the one used
previously. Here, in order to show Pen) we show three things:
Exerci ses
5 . 1 . 1 Using the recursive definitions of addition and multiplication of natural
numbers given in Section 5.1, prove all or some of the following familiar
properties of arithmetic:
Ofn = n n+O (+ has identity 0)
1Xn = n = nx1 (x has identity 1)
m+n = n+m (+ commutative)
k + (m + n) (k + m) + n (+ associative)
mxn = nxm (x commutative)
k X (m X n) = (k X m) X n (x associative)
k x (m + n) = ( k x m) + (k x n) (+ distributes through x )
for all natural numbers k , m, and n.
5 . 1.3 The binomial coefficient G;) denotes the number of ways of choosing k
objects from a collection of n objects. Here is a table of the values of (�) for
o � n, k � 4:
n\ k 0 1 2 3 4
0 1 0 0 0 0
1 1 1 0 0 0
2 1 2 1 0 0
3 1 3 3 1 0
4 1 4 6 4 1
Observe that this table is essentially Pascal's Triangle shifted to the left and
padded with zeros. Each element in the table ( except for the edges ) is the
sum of the element immediately above it and the element above it and to the
left. In the programming notation, we will write binom n k for (�) .
a. Give a recursive definition of binom.
b. Prove that if k > n then (�) = o.
c. Rewrite the equation:
E (;;) = 2n
OSkS n
in our programming notation. Prove that the equation is true for all
natural numbers n .
#[] = 0
#(x : xs) = l + (# xs)
We can use these two equations to compute the length of the list [1, 2, 3]
as follows:
#[1, 2, 3] = 1 + (#[2, 3]) ( # .2)
= 1 + ( 1 + (#[3])) ( # .2)
= 1 + ( 1 + ( 1 + (#[ ]))) (#.2)
= 1 + ( 1 + (1 + 0)) (#.1)
= 3
As a second example, here is the recursive definition of concatenation:
[ ] * ys = ys
(x : xs) * ys = x : ( xs * ys)
For example:
[1, 2] * [3, 4] = 1 : ([2] * [3, 4])
= 1 : (2 : ( [ ] * [3, 4]))
= 1 : (2 : [3, 4] )
= [1, 2, 3, 4]
Induction adapts to lists as easily as recursion does. The principle of
induction over lists is as follows. To prove by induction that P(xs) holds for
any finite list xs one must show two things:
Case [ ] . That P([ ] ) holds; and
Case (x : xs). That if P(xs) holds, then P(x : xs) holds for every x .
This i s valid by an argument similar t o the one we used for natural numbers.
We know by the first case that P([ ]) holds; and so we know by the second
case that P([x] ) also holds for every x ( since [x] is x : [ ]) ; and so we know
again by the second case that P([y , x]) also holds for every y and x ( since
[y, x] is y : [x] ) ; and so on. Thus P(xs) holds for every finite list xs . It is
easy to formalise this proof by inducting on the length of the list xs, so the
principle of induction over lists is actually just a consequence of the principle
of induction over natural numbers.
In Chapter 3 we stated a number of laws without giving any proofs. Let
us finally prove some of them.
The first law we saw states that concatenation is associative, that is:
xs * (ys * zs) = (xs * ys) * zs
for every finite list xs, ys, and zs .
The reader should consider why the induction in this proof is on xs , and not
ys or zs.
The next law we saw states that the identity element for concatenation
is [ ] , that is:
[ ] * xs = xs * [ ] = xs
for every finite list xs. The first half of this law is part of the recursive
definition of concatenation. The second half is an easy proof by induction on
xs, and the pleasure of this proof will be left to the reader.
To drive the idea of induction home, we will do one more proof. The next
law we saw relates length and concatenation, namely:
#(xs * ys) = (#xs) + (#ys)
for every finite list xs and ys.
Proof. The proof is by induction on xs.
Case [ ] . We have:
# ( [ ] * ys) = #ys (*. 1 )
0 + (#ys) ( arithmetic )
= ( # [ ]) + (#ys) (#.1)
which establishes the case.
Case (x : xs). We have:
#« x : xs) * ys) = #(x : (xs * ys)) (* .2)
= 1 + (#(xs * ys)) (#.2)
= 1 + (#xs) + (#ys) ( hypothesis )
(#(x : xs) ) + (#ys) (#.2)
which establishes the case. D
In mathematics, one writes down a proof in the way that looks most elegant,
not the way one first discovers it . Both of the proofs above were discovered by
a technique that can reduce many proofs to a simple exercise in calculation.
Here is how the technique was applied to the (x : xs) case in the proof that
concatenation is associative. What we wish to show is that :
(x : xs) * (ys * zs) = « x : xs) * ys) * zs
112 RECURSION AND INDUCTION
This can be done as follows. First, we start with the left-hand side, and
simplify it as much as possible:
(x : xs) * (ys * zs) = x : (xs * (ys * zs)) ( * .2)
x : « xs * ys) * zs ) ( hypothesis )
Here "simplify" means that whenever the left-hand side of a definition ( or
of the induction hypothesis ) matches a term or a part of a term, then it is
replaced by the corresponding right-hand side. Second, we do the same thing
to the right-hand side:
« x : xs) * ys) * zs) = (x : (xs * ys)) * zs ( * .2)
= x : « xs * ys) * zs) (*.2)
Since the two results are the same, we have demonstrated the desired equality.
The proof above was obtained by copying the first group of equations followed
by writing the second group of equations in the reverse order.
Thus, many proofs just require a proper choice of induction variable fol
lowed by an exercise in simplificatioll. This process is so simple that one
might think it could be automated - and indeed it has been. A great deal
of work has been done on automatic or machine-aided generation of proofs,
and most of the systems for this have simplification of the sort described
here at their core. The interested reader should consult Boyer and Moore [4] ,
Gordon et al [5] , or Paulson [6] for examples. On the other hand, as we shall
see, many proofs require insight and ingenuity for their completion.
In this section we show how some of the list operations introduced in Chap
ter 3 can be defined formally by using recursion, and how some of the related
laws can be proved using induction. The operators discussed are chosen in or
der to illustrate variations on the basic principles introduced in the preceding
sections. Formal definitions of the remaining operators are left as exercises.
5.3.1 Zip
Recall that zip takes a pair of lists and returns a list of pairs. Here is its
recursive definition:
zip ( [ ] , ys) = []
zip ( x : xs , [ ] ) = []
zip ( x : xs , y : ys) ( x , y ) : zip (xs, ys)
This definition obviously covers every possible combination of the two argu
ment lists. Either:
Case [ ] , ys. The first is empty; or
5.3 OPERATIONS ON LISTS 113
This inductive proof is valid because, as was observed before, the three cases
( [ ] , ys), (x : xs, [ ] ) , and (x : xS , y : ys) cover every possible combination of
two lists. Formally, one can justify this sort of "double induction" as two
nested inductions. Here the outer induction is on xs, and the inner induction
is on ys.
Rather than the definition above, one might be tempted to give the fol
lowing definition of zip instead:
zip ( [ ] , ys) = []
zip ( xs, [ ]) = []
zip ( x : xs , y : ys) = ( x , y) : zip (xs, ys)
However, this definition is illegal in the notation used in this book, because
it is ambiguous. Given the term zip ([ ] , [ ]) , either the first equation or the
second equation might apply. In this case both equations happen to yield
the same result, [ ] , but we cannot guarantee this for all definitions. So it is
required that in every definition at most one equation must apply for every
possible choice of arguments.
Perhaps the reader thinks the second, illegal, definition is better, because
it is more symmetric. In fact the function zip is not symmetric. On any
sequential computer, evaluating zip (xs , ys) will require examining either xs
first or ys first. A fundamental law of computation is that any process that
examines the value 1. must return the value 1. . It is clear from the legal
definition that evaluation of zip ( xs, ys) examines xs first, and if xs is then []
ys is not examined. Thus, the term zip (1. , [ ] ) has the value 1., since no
equation in the legal definition applies, while the term zip ( [ ], 1.)
has the
value [ ] , by equation (zip.1). For the illegal definition to be valid, both
zip (1., [ ] ) and zip ( [ ] , 1.) would have to return the value [],
but it is not
possible to obtain this behaviour using sequential computation. This is why
it was chosen to make the second definition illegal.
Again, these definitions cover every possible combination of the two argu
ments.
An important law relating take and drop is that:
take n xs * drop n xs = xs
for every natural number n and finite list xs . Using the methods developed
so far, the following proof is straightforward.
Case 0, xs . We have:
take (n + 1) ( x : xs ) * drop ( n + 1) ( x : xs )
= ( x : take n xs ) * drop n xs ( take.3, drop.3)
= x : (take n xs * drop n xs ) (*.2)
= x : xs ( hypothesis )
Observe that take 0 .1. = [ ] by (take.l), whereas take .l. [ ] = .1. since no
equation applies; and similarly for drop. In particular, we have:
which does not satisfy the law proved above. When we use induction to prove
a law "for every natural number n" this does not cover the possibility that
n is .1. . If we wish to cover this possibility, it must be included in the proof
as an additional case. Similarly, the case that a list variable may be .1. must
also be treated separately; and we shall see examples of this in Chapter 7
when we discuss how to prove properties of infinite lists.
116 RECURSION AND INDUCTION
hd (x : xs) = x
tl (x : xs) = xs
No equation is provided defining hd [ ] or tl [ ] , so these both take the value
l..
We have already seen the law:
[hd xs] * tl xs = xs
for every non-empty finite list xs.
The case for [ ] was not included because it is given that xs is non-empty.
This simple proof requires only a trivial case analysis, but no induction.
init [z] = []
init (x : Z8) = z : init xs
for the definition of init . And, again, this is illegal in the notation used in
this book. In particular, init [z]
can be reduced in two ways: it reduces to []
by the first equation, and reduces to z: []
init by the second equation (since
[z] (z : [ l)).
is
The new elements we have here are higher-order functions and conditionals
in a recursive function definition.
For example, we have:
The above proof considered two possibilities for the value of (p (f x )) , namely
that it is True or False. This suffices if the predicate p is total. Otherwise,
we must consider one additional case, namely, (p (f x )) = 1.. This case is left
as an exercise.
5.3.6 Interval
The special form [m . . n] denotes the list of numbers from m to n. We can
give a recursive definition of the function interval, where:
interval m n = [m . . n]
in the following way:
Here, the two cases are distinguished by testing the condition m > n, rather
than by the use of pattern matching. The value (interval m n) is well defined
for all m and n because the quantity n - m decreases at each recursive call.
Reasoning about interval requires the following principle of induction.
We may prove that a proposition P( m, n) holds for every integer m and n
by showing two things:
Case m > n. That P( m, n) holds when m > nj and
Case m � n. That if P{m + 1 , n) holds, then P(m, n) holds when m � n.
This principle may be justified by induction ( over natural numbers ) on the
1.
quantity n - m + Because of this, we will refer to this method of proof as
induction on the difference between n and m.
We can now prove the useful law:
Case m � n. We have:
map (k+ ) (interval m n)
= map (k+) (m : interval (m + 1) n) ( interval . 2)
k + m : map (k+ ) (interval (m + 1) n) (map.2)
k + m : interval (k + m + 1) (k + n) (hypothesis)
= interval (k + m) (k + n) ( interval.2)
which establishes the case. 0
Exerc i ses
5.3.1 Give a recursive definition of the index operation (xs ! i) .
5.3.2 Give recursive definitions of takewhile and dropwhile.
5.3.3 Prove the laws
init ( xs * [x]) = xs
last (xs * [x]) = x
xs = init xs * [last xs]
takewhile p xs * dropwhile p xs = xs
for every finite list xs and ys and every natural number i. (Hint: The second
law will be easier to prove if it is rewritten in terms of j , where j = i - #xs.)
5 .3.8 Let interval1 stand for the special form [1 . . n] . Show that the defini
tion:
interva11 0 = []
interval1 (n + 1) = 1 : map ( + 1) (interval1 n)
follows from the definition of interval and the law given in the text.
for every integer k , m, and n. Is the same law valid if k , m, and n are not
restricted to integers? Explain why. [Hint: An obvious decomposition is to
use the cases k > m and k ::5: m . This turns out not to be helpful, since we
are given that k ::5: m in the statement of the law. Use the cases k + 1 > m
and k + 1 ::5: m instead. ]
5.3.10 Let (interval3 a b c) stand for the special form [a, b . . c]. Give a
recursive definition of interval3.
5.4.1 Li st di fference
The value of xs -- ys is the list that results when, for each element y in ys,
the first occurrence (if any ) of y is removed from xs . A recursive definition
of list difference is:
xs -- [ ] = xs
xs - - ( y : ys ) = remove xs y - - ys
remove [ ] y []
remove (x : xs) y = xs, if x = y
x : remove xs y, otherwise
(xs * ys ) - - xs = ys
for every finite list xs and ys . The proof of this law is straightforward using
the techniques described in the previous sections, and is left as an exercise.
5.4.2 Reverse
One way to define the function that reverses the elements of a list is as follows:
reverse [ ] = []
reverse ( x : xs ) = reverse xs * [x]
Let us now prove that:
In the main proof, the only place that the auxiliary result was used was to
demonstrate that:
and apply this technique. First , replace xs by (x : xs). Next , simplify the
left-hand side by applying (reverse.2) and the right-hand side by applying
the induction hypothesis. This yields:
foldr (ffi) a [ ] = a
foldr (ffi) a (x : xs ) = x ffi (Joldr (ffi) a xs)
foldl (ffi ) a [ ] = a
foldl ( ffi ) a (x : xs) = foldl (ffi) ( a ffi x ) xs
The reader should take a moment to check that the formal definitions do
indeed correspond to the informal definitions.
5.4 AUXILIARlES AND GENERALISATION 125
The second duality theorem ( see Section 3.5.1) is as follows: let ( ffi ) and
( ® ) be two binary operators and a a value such that:
x ffi ( Y ® z ) = ( x ffi y ) ® z ( i)
x ffi a = a ® x (ii )
for every x and y and every finite list xs. We will first prove the auxiliary
result , and then prove the main result. The auxiliary is proved by induction
on xs .
Case [ ] . We have:
Having proved the auxiliary, we now turn to the proof of the main result.
The proof is by induction on zs .
Case [ ]. We have:
In the main proof, the only place that the auxiliary result was used was to
demonstrate that:
This was generalised by replacing the value a by the variable y. One might
expect that this would make the proof harder, because there is a property
of a that might be used in the proof ( namely, it satisfies law (ii ) above ) but
y has no special properties. But in fact it makes the proof easier because
it makes the induction hypothesis more widely applicable. In the proof of
the auxiliary result given above, the induction hypothesis was invoked with y
replaced by (y 159 x') , and this was possible exactly because we had generalised
the auxiliary.
The next example again involves the reverse function, and is motivated by
considerations of efficiency.
We can think of a computation as consisting of a number of reduction
steps, where each reduction step consists of applying one equation in the
definition of a function. For example, computing [3, 2] * [1] requires three
reduction steps:
The last line is not a reduction step; it is just a different way of writing
down exactly the same structure. ( Inside the computer, both 3 : (2 : [1] ) and
[3, 2, 1] are represented as 3 : (2 : (1 : [ ] )).) In general, if xs has length n then
computing xs * ys will require n reductions by (* .2) followed by a single
reduction of (*.1); so the number of reduction is roughly proportional to n.
5 .4 AUXILIARlES AND GENERALISATION 127
twofib 0 (0, 1)
twofib (n + 1) = ( b, a + b)
where (a, b) = twofib n
The key to understanding this definition is the fact :
twofib n = (fib n,fib (n + 1))
which is easy to prove by induction, and from which it follows immediately
that fastfib n = fib n for every natural number n. Here the key to efficiency is
again to solve a harder problem. Rather than just compute fib n, it is more
efficient to compute fib n and fib (n + 1) at the same time.
5.5 PROGRAM SYNTHESIS 129
Exercises
h = foldr ( ffi) a
Prove that:
h (xs * ys) = ( h xs ) ffi (h ys)
h ( concat xss) = h ( map h xss)
for every finite list xs and ys and every finite list of finite lists xss.
used. Here, we show how two of the inductive proofs considered previously
can be recast as syntheses of recursive definitions.
Initial segment . In Section 5.3.4, we saw a recursive definition of init ,
which was followed by a proof of the law that:
init ( x : x, : xs)
= take (#(x : x' : xs) - l ) (x : x' : xs) (instantiation)
= take (#xs + 1) (x : x' : xs) ( # .2, arithmetic)
= x : take (#xs) (x' : xs) ( take.3)
= x : take (#(x' : xs) - 1) (x' : xs) ( # .2, arithmetic)
= x : init (x' : xs) (hypothesis)
last line, where ( #zs) is changed to (#(z' : :ca ) 1), in order to provide
-
a better ''fit'' against the given specification. Thus, as with many proofs,
many syntheses consist of a proper choice of case analysis, followed ( mainly )
by simplification.
Fast Fibonacci. In Section 5.4.5, we saw a definition that can be used
to compute }ib n, the nth Fibonacci number, in about n steps, whereas the
traditional definition requires about fib n steps. The key idea was to define
a function twofib such that :
This allows for improved efficiency because one has available the value of two
successive Fibonacci numbers when calculating the next one.
Previously the definition of twofib was given and it was asserted that it
was easy to prove the above equation from the definition. Clearly, it would
be better to start with the equation, and then synthesise a definition of twofib
that satisfies it. That is what we shall do now.
The synthesis is by instantiation of n.
Case O. We have:
twofib 0 = (fib O,fib 1) (instantiation )
= (0, 1 ) (fib. l ,fib.2)
which gives the first equation.
Case ( n + 1 ) . We have:
Exercises
5.5.1 Synthesise the recursive definition of concatenate from the specifica
tion:
xs * ys = foldr ( : ) ys xs
for every finite list xs and ys.
5.5.2 Synthesise the recursive definition of length from the specification:
#xs = sum (map ( const 1) xs)
for every finite list xs. Do this once taking sum = foldr( + )0, and again taking
sum = foldl (+) o. (Hint: The second case requires an auxiliary function. )
5 . 5 .3 The association list function assoc is defined as follows:
assoc xys x = hd [y I (x', y) - XYS j x, = x]
Synthesise a recursive definition of assoc. (Hint: First translate the list
comprehension in the definition of assoc to be in terms of map and filter,
using the method in Section 3.4.)
5 . 5 .4 Synthesise the recursive definitions of list subtraction and remove from
the specification:
xs - - ys = foldl remove xs ys
remove xs y = takewhile (i' y) xs * drop 1 ( dropwhile (i' y) xs)
for every y and every finite list xs and ys.
5.5.5 Synthesise the recursive definitions of take and drop from the specifi
cation:
take n xs * drop n xs = xs
#( take n xs) = n min (#xs)
for every natural number n and every finite list xs .
Many interesting problems are combinatorial in nature, that is, they involve
selecting or permuting elements of a list in some desired manner. This section
describes several combinatorial functions of widespread utility.
Initial segments. The function inits returns the list of all initial segments
of a list, in order of increasing length. For example:
? inits "era"
["" , "e" , "er" , "era" ]
? map sum ( inits [1 . . 5])
[0, 1, 3 , 6 , 10, 15]
5.6 COMBINATORIAL FUNCTIONS 133
Notice that :
scan f a xs = map (foldl f a) ( inits xs )
provides one way to define the scan operation. IT xs has length n, then initsxs
will have length n + 1.
A recursive definition of inits is:
inits [ ] = [[ II
inits (x : xs ) = [[ ]] * map (x :) (inits xs)
The empty list has only one initial segment , namely itselfj hence the first
equation. A non-empty list (x : xs) will still have the empty list as an initial
segment , and its other initial segments will begin with an x and be followed
by an initial segment of XS j hence the second equation.
S ubsequences. The function subs returns a list of all subsequences of a list .
For example:
? subs "era"
['' '' , "a" , "r" , "ra" , "e" , "ea" , "er" , "era" ]
The ordering of this list is a consequence of the particular definition of subs
given below. IT xs has length n, then subs xs has length 2n . This can be seen
by noting that each of the n elements in xs might be either present or absent
in a subsequence, so there are 2 X X 2 (n times ) possibilities.
. . •
(y : ys) are either to begin the list with x and follow it with (y : ys), or to
begin the list with y and follow it with an interleaving of x into ys .
Alternatively, a non-recursive definition is given by:
can be seen by noting that any of the n elements of xs may appear in the
first position, and then any of the remaining n 1 elements may appear in
-
the second position, and so on, until finally there is only one element that
may appear in the last position.
A recursive definition of perms can be made using the function interleave
defined in the previous section:
perms [ ] = [[]]
perms ( x : xs) = concat (map ( interleave x ) (perms xs) )
That i s , there i s only one permutation of the empty list , namely itself. The
permutations of the non-empty list (x : xs) are all ways of interleaving x into
a permutation of xs.
The use of concat in the above definition is essential. This can be seen
by considering types. The types of perms and interleave are:
perms .. [a] -+ [[a]]
interleave .. a -+ [a] -+ [[a]]
Assuming x :: a and xs :: [a] , we have:
concat xss = xs
For example, ["era"] and ["e" , "ra"] and ["e" , "" , "r" , "" , "a"] are all partitions
of "era" . In general, the partition of a list can be arbitrarily long, since it
may contain any number of empty lists. We say that xss is a proper partition
of xs if it is a partition and it contains no empty lists.
The function parts returns a list of all proper partitions of a list. For
example:
? parts "era"
[["era"] , ["er" , "a"] , ["e" , "ra"] , ["e" , "r" , "a"]]
If xs has length n then parts xs has length 2 n -1 . This can be seen by noting
that there are n 1 places where a break might occur (between each element
-
of the list xs) and for each place there are two possibilities (either a break
occurs there or it does not).
A recursive definition of parts is given by:
parts [ ] = [[ ]]
parts [x] = [[[ x]]]
parts (x : (x' : xs)) = map (glue x ) (parts (x' : xs))
* map ( [x] : ) (parts (x' : xs))
glue x xss = (x : hd xss) : tl xss
Exercises
5.6.1 The function segs xs returns a list of all contiguous segments in a list.
For example:
? segs "list"
[ "" , "t" , "s" , "st" , "i" , "is" , "ist" " "1" "li" , "lis" , "list"]
Give a recursive definition of segs. If xs has length n, what is the length of
segs xs?
5 .6.2 The function choose k xs returns a list of all subsequences of xs whose
length is exactly k. For example:
? choose 3 "list"
["ist" , "1st" , "lit" , "lis"]
Give a recursive definition of choose. Show that if xs has length n then
choose k xs has length nk (see Exercise 5.1.3).
5 .6.3 Prove that:
subs ( map f xs) = map (map f) ( subs xs)
for every function f and finite list xs . What are the corresponding laws for
inits, perms, and parts?
5.6 COMBINATORIAL FUNCTIONS 137
breaks [ ] xs = [xs]
breaks ( k : ks) xs = take k xs :
breaks (map (+( -k)) ks) ( drop k xs)
Next , use this definition to prove the following two lemmas:
Then use Exercises 5.3.8 and 5.6.3 to prove the final result . )
C hapt er 6
Efficiency
We have already seen several examples of programs that perform the same
task but have different efficiencies. For instance, Sections 5 .4.2 and 5.4.4
described two programs for reversing lists. If the listxs has length n, then
reverse xs requires a number of steps proportional to n2 to compute the
rev xs
reversed list , while only requires a number of steps proportional to n.
The awkward phrase 'number of steps proportional to' has appeared in
this book wherever efficiency is discussed. Fortunately, a less cumbersome
mathematical notation exists. First , we adopt the convention that Tf (x)
stands for the number of reduction steps to compute (f x),that is, the number
of steps required to reduce (f x) to canonical form. The 'T ' is short for
'time', since the number of reduction steps is generally proportional to the
time required to perform a computation. The number of reduction steps is
measured with respect to the graph reduction model of computation, which
is discussed in more detail in the next section.
Second, we use the O-notation, which allows us to write the two state
ments above as follows:
Tret/er.e ( xs) = O ( n2 )
Tret/ (xs) = O ( n)
where n is the length of xs. The '0 ' is short for 'order of at most', and a
precise explanation of its meaning follows.
138
6.1 ASYMPTOTIC BEHAVIOUR 139
g(n) =
O ( h (n))
Then we have:
g(n) = O(nm )
A suitable M in this case is given by M = lao l + lal l + . . . + l am l . It is left as
an exercise to show for this value of M that Ig(n) 1 ::s; Mnm for every n > O.
The definition of O-notation is cal'eful to guarantee the inequality only for
positive n, so that the case n = 0 causes no problems.
Note that the O-notation must be used with some care. For example, if
f(n) = O ( h(n)) and g(n) = O ( h (n)), this does not imply that f(n) = g(n).
By convention, we always write equations containing O-notation so that the
left-hand side of an equation is never less precise than the right-hand side. For
example, it is fine to write n 2 + n = O ( n 2 ) , but we never write O ( n 2 ) = n 2 + n.
Otherwise, since 2n 2 = O(n 2 ) , we could come to the absurd conclusion that
2n2 = n2 + n for every n.
To return to our example, the statements:
Tre1Jerse ( xs ) = O ( n 2 )
Tre" ( xs ) = O (n)
mean that there exist constants MI and M2 such that for every list xs of
length n, where n > 0, the number of steps to compute reverse xs is bounded
by Ml n 2 , and the number of steps to compute rev xs is bounded by M2 n.
Nothing is said about the relative values of MI and M2 . However, even if
MI is much less than M2 , the second program is still the faster one in almost
all cases. For example, say that MI is 2 and M2 is 20. Then for short lists
the first program is indeed better: applying reverse to a list of length 5 will
require at most 50 steps (2 X 5 2 ) , while applying rev requires up to 100 steps
(20 X 5 ) . But for longer lists the second program will be better: applying
reverse to a list of length 20 requires up to 800 steps, while applying rev
140 EFFICIENCY
requires at most 400. As the lists get longer and longer, the advantage of rev
becomes more and more pronounced.
Estimating the performance of a program using just O-notation is called
asymptotic analysis, and the estimate is called the order of magnitude. For
example, reverse has a quadratic order of magnitude, and rev has a linear
one. In geometry, an asymptote is the limit of a curve as it approaches
infinity. Similarly, asymptotic analysis describes the behaviour of a program
as it 'approaches infinity', that is, as it processes larger and larger data items.
For many purposes an asymptotic analysis is sufficient . Generally, the
time required to compute a value is proportional to the number of reduction
steps required, regardless of the particular implementation used. If one pro
gram has a smaller order of magnitude than another, it will generally run
faster under all implementations for sufficiently large input. Thus, asymp
totic analyses yield information that is implementation independent .
On the other hand, sometimes more detailed information is necessary.
For instance, we may need to compare two programs that have the same
order of magnitude of reduction steps. Also, when the data are not large an
asymptotic analysis may be less relevant and we may wish to know the size of
the constants of proportionality. In such cases, an exact count of the number
of reduction steps may be useful. Another useful method is testing: to run
each program on a range of data values with a stopwatch in one hand - or the
computerised equivalent. Generally, at least some testing will be desirable
to confirm that the model is a reasonable predictor of actual behaviour. As
in any engineering discipline, one must decide on the appropriate mixture
between simple models ( such as asymptotic analysis ) , more detailed models
( such as counting the exact number of reduction steps ) , and testing ( such as
timing the program in a particular implementation ) .
Exercises
6.1 .1 Use the T and O-notations to give computation times for the following
functions: hd, last, (#), fib, and fastfib.
are valid.
6.2 MODELS OF REDUCTION 141
As noted, the phrase 'number of reduction steps' has appeared several times
in this book. The purpose of this section is to give a more precise definition
of its meaning.
Say that we have defined:
pyth x y = sqr x + sqr y
sqr x = x x x
term ( 'redex' is short for ' reducible expression' ) . In each step, either the
redex matches the left-hand side of an equation (like sqr 3) and is replaced
by the corresponding right-hand side (3 x 3), or the redex is a primitive
application (like 3 x 3) and is replaced by its value (9).
The number of steps may depend on the order in which redexes are re
duced. Say that:
1st ( x , y) = x
as usual, and we wish to evaluate 1st (sqr 4, sqr 2). One possible reduction
sequence is:
1st (sqr 4, sqr 2) => 1st (4 x 4, sqr 2) (sqr)
=> Ist (16, sqr 2) (x)
=> Ist (16, 2 x 2) (sqr)
=> 1st (16, 4) (x)
=> 16 (1st)
which requires five steps. A second possibility is:
1st (sqr 4, sqr 2) => sqr 4 (1st)
=> 4 x 4 (sqr)
=> 16 (x)
which requires only three. .
These two reduction sequences illustrate two reduction policies, called
innermost reduction and outermost reduction, respectively. In the first, each
step reduces an innermost redex, that is, one that contains no other redex.
In the second, each step reduces an outermost redex, that is, one that is
contained in no other redex.
142 EFFICIENCY
6.2.1 Termination
Some reduction orders may fail to terminate. Say we have defined:
answer = 1st (42, loop)
loop = tl loop
and 1st and sqr are as defined above. Evaluating answer by innermost re
duction we have:
answer => 1st (42, loop) ( answer)
=> 1st (42, tl loop) ( loop)
=> 1st (42, tl ( tl loop)) ( loop)
=>
�(4 + 2)
represents the term (4 + 2) x (4 + 2).
Each instance of the subterm 4+2
is represented by an arrow - calle d a pointer - to that term. Now, using
( outermost ) graph
reduction we have:
sqr (4 + 2) => (I x l)
=> (I x l)
=> 36 (x)
which requires only three steps. Thus, the use of graphs avoids the problem
of duplicated subterms. With graph reduction, outermost reduction never
performs more reduction steps than innermost.
144 EFFICIENCY
t qrt (sqr 5 - 4 X 1 X 3) � (2 X 1)
and this term can then be reduced in the same way as before. In Section 7.6
we shall see how where clauses may also be used to introduce cyclic graphs.
Arrows have no meaning except to indicate sharing. Both:
and sqr (4 + 2)
are equivalent ways of writing the same term.
We will use the total number of arguments as a measure of the size of a
term or graph. For example, the term:
sqr 3 + sqr 4
has size four (each application of sqr has one argument, and + has two
arguments) and the graph:
Sometimes we need to reduce a subterm, but not allthe way to normal form.
Consider the outermost reduction:
hd (map sqr [1 . . 7])
=> hd (map sqr (1 : [2 . . 7]))
=> hd (s qr 1 : map sqr [2 . . 7])
=> sqr 1
=> 1x1
=> 1
Here we needed to reduce (map sqr [1 . . 7]),
but not all the way to the normal
form [1, 4, 9, 16, 25, 36, 49].
This in turn required reducing [1 . but not all . 7],
the way to the normal form [1, 2, 3, 4, 5, 6, 7].
6.2 MODELS OF REDUCTION 145
However, any term that is reduced must be reduced to head normal form.
By definition, a term is in head normal form if it is not a redex, and it cannot
become a redex by reducing any of its subterms. Every term in normal form
is in head normal form, but not vice versa. In particular, any term of the
form (el : e2) is in head normal form, because regardless of how far el and
e2 are reduced there is no reduction rule that applies to this term. However,
( el : e2 ) is in normal form only when el and e2 are both in normal form.
Similarly, any term of the form ( el l e2 ) is in head normal form, but is in
normal form only when el and e2 are.
Whether a term needs to be reduced further than head normal form
depends on the context in which it appears. In the example above, the
subterm map sqr [1 . . 7] only needed to be reduced to its head normal form,
sqr 1 : map sqr [2 . . 7]. On the other hand, to print map sqr [1 . . 7] , first it
would be reduced to head normal form, then sqr 1 would be reduced to its
normal form, 1 , then map sqr [2 . . 7] would be reduced to head normal form,
and so on.
is also outermost , since in each case ( tl loop) appears inside no other redex.
When a function application involves pattern matching, we allow a sub
term to be reduced only if it is required by the pattern. Thus, given the
above term:
zip (map sqr [ ] , loop)
146 EFFICIENCY
sum [1 . . 1000]
=? foldl ( + ) 0 [1 . . 1000]
=? foldl (+) (0 + 1) [2 . . 1000]
=? foldl (+ ) « 0 + 1) + 2) [3 . . 1000]
=? 500500
Notice that in computing sum [1 . . n] by outermost order reduction, the re
duction terms grow in size proportional to n. On the other hand, if we use a
judicious mixture of outermost and innermost reduction order, then we have
the following reduction sequence:
sum [1 . . 1000]
=? foldl ( + ) 0 [1 . . 1000]
=? foldl (+)( 0 + 1) [2 . . 1000]
=? foldl ( + ) 1 [2 . . 1000]
=? foldl (+) (1 + 2) [3 . . 1000]
=? foldl ( + ) 3 [3 . . 1000]
=? foldl ( +) 500500 [ ]
=? 500500
Now the maximum size of any term in the reduction sequence is bounded
by a constant. In short, reducing sum [1 . . n] to normal form by purely
outermost reduction requires O (n) space, while a combination of innermost
and outermost reduction requires only 0(1) space.
This suggests that it would be useful to have a way of controlling reduction
order, and we now introduce a special function, strict, that allows us to do
so.
incr x = x +1
6.3 REDUCTION ORDER AND SPACE 149
then:
incr (incr (8 X 5»
� incr (8 X 5) + 1
� «8 x 5) + 1) + 1
� (40 + 1) + 1
� 41 + 1
� 42
but:
strict incr (strict incr (8 X 5»
� strict incr ( strict incr 40)
� strict incr ( incr 40)
� strict incr (40 + 1)
� strict incr 41
� incr 41
� 41 + 1
� 42
Both cases perform the same reduction steps, but in a different order.
Currying applies to strict as to anything else. From this it follows that if
f is a function of three arguments, writing strict (J el ) ez e3 causes the second
argument to be reduced early, but not the first or third.
Given this, we can rewrite the definition of foldl
as follows:
foldl' ( EB) a [ ] = a
foldl' ( EB) a ( x : xs) = strict (Joldl' ( EB )) ( a EB x ) xs
We now have:
sum [1 . . 1000]
� foldl' (+) 0 [1 . . 1000]
� strict (Joldl' ( +)) (0 + 1) [2 . . 1000]
� foldl' ( + ) 1 [2 . . 1000]
� strict (Joldl' (+» (1 + 2) [3 . . 1000]
� foldl' ( + ) 3 [3 . . 1000]
� foldl' ( + ) 500500 [ ]
� 500500
which has the desired space behaviour. Here we have assumed it is valid to
replace foldl foldl'
by in the definition of sum. We will prove this shortly.
The function strict should be used sparingly. The definition of is foldl'
one of the very few places where we recommend its use.
150 EFFICIENCY
6.3.2 Strictness
strict f x 1. , x
if = 1.
= f x, otherwise
The first duality theorem, stated in Section 3.5.1, states that if (61)
is asso
a
ciative with identity (that is, if (61) a
and form a monoid) , then:
If ( EB ) does not satisfy the above properties, then choosing a winner may
not be so easy. A good rule of thumb, though, is that if ( EB ) is non-strict in
either argument then foldr is usually more efficient than foldl. We give two
examples.
For the first example, consider the (1\) operation. Recall that ( 1\ ) is strict
in its first argument , but non-strict in its second. In particular, False 1\ x
returns False without evaluating x . Assume we are given a list :
Exercises
6.3.3 Give the time and space required to compute each of:
6.4.1 Sorting
Let us start by considering the problem of sorting. We have already encoun
tered one sorting method (in Exercise 3.5.4), namely insertion sort: .
isort = foldr insert [ ]
insert x xs = takewhile (� x) zs * [x] * dropwhile (� x) xs
Let Tifl6ert ( n ) denote the time to insert an element into a list of length n
using insert, n) n
and let Ti8ort ( denote the time to sort a list of length using
isort.Since the evaluations of takewhile (� x) zs
and dropwhile (� x) xs
each
require O (n) steps, where = n # xs,we have:
Tinsert ( n) = O (n)
It follows that :
6.4 DIVIDE AND CONQUER 153
So, insertion sort never requires more than quadratic time to sort its argu
ment . Since it does require quadratic time in the worst case (for example,
when the input list is in descending order ) , the above bound is tight .
Using methods we have described previously, it is a simple exercise to
synthesise the following, equivalent, version of insert:
insert x [] [x]
x : y : ys, x ::; y
=
insert x (y : ys) = if
= y : insert x ys , otherwise
Evaluating (insert x xs) with the new definition requires about three times
fewer reduction steps than before, but the new version of isort
still possesses
a quadratic bound: if the length of the input list increases by a factor of 10,
the time to sort it may increase by a factor of 100.
Insertion sort works by dividing the list into two parts - the first element
and the remainder. It sorts the remainder, and then inserts the first element
into the sorted list. A variation on this idea is to divide the list into two parts
of roughly equal size, sort each part , and then merge the resulting lists. This
approach yields the following divide and conquer algorithm, called merge
sort:
msort xs = xs , if n ::;
1
= merge (msort us) (msort vs), otherwise
where n = #xs
us take (n div 2) xs
=
vs drop ( n div 2) xs
=
merge (x : xs ) [] = x : xs
merge (x : xs) (y : ys) = x : merge xs (y : ys) , if x ::; y
y : merge (x : xs) ys,
= otherwise
To analyse the performance of merge sort, let Tme rge (n) denote the time
n,
to merge two lists of combined length and let Tm6ort (n) denote the time to
n
sort a list of length using msort .
It is easy to check that Tmerge (n) = O (n)
since each reduction step produces at least one more element of the result.
n
To calculate Tm8ort(n) for > 1 , observe that O (n) steps are required to
split the argument list into two sublists, approximately Tmsort ( div 2) stepsn
are required to sort each sublist and, as we have seen, another O ( steps n)
are required to merge the results. Hence:
For a thorough discussion of average case analysis, and the proof that the
above claim holds, the reader should consult Knuth [10] .
From the preceding discussion we see that it is desirable, when designing
a divide and conquer algorithm, to ensure that the subproblems are roughly
equal in size. Even more important , it is vital to guarantee that the sub
problems are smaller than the original problem. If the subproblem has the
same size as the original problem, then no progress will be made, and the
algorithm will enter an infinite loop. Both merge sort and quicksort ensure
that the subproblems are indeed smaller than the original. In merge sort,
this happens since n div 2 < n whenever n > 1. In quicksort , this happens
since the first element is removed from consideration each time.
6.4.2 Multiplication
x = xl lO n/ 2 + X o
y = Yl lO n/ 2 + Yo
This equation shows that the problem of multiplying two n-digit numbers can
be solved in terms of four multiplications of two n/2-digit numbers. Now,
multiplying a number by a power of 10 means shifting its representation by
156 EFFICIENCY
T(n) = 0(1), if n = 1
= 4T(n/2) + O ( n), if n > 1
The reader should compare this recurrence relation with the one for merge
sort . Here there are four subproblems of size n/2, while in the sorting problem
there were only two. The recurrence can be solved by use of the following
general result (whose proof is omitted):
Suppose T satisfies:
T(n) = 0(1), if n � 1
= aT(n/b) + O(n), if n > 1
With this revision there are now only three multiplications of n/2-digit num
bers. The price paid for removing one multiplication is that there is one
more addition and two new subtractions. But since adding or subtracting
two n-digit numbers requires O( n) steps, we now have:
The new version of the divide and conquer algorithm is therefore asymptot
ically superior by a factor of nO .4 •
6.4 DIVIDE AND CONQUER 157
Although it expresses what is wanted in the clearest possible way, this def
inition does not lead to the most efficient algorithm. The reason for this is
that the function min
is strict , demanding complete evaluation of its argu
x)
ment , and so (p is evaluated for every in the range x a ::; x ::;
b. Setting
n # [a . . b],
= we therefore have that p (find a b) n
requires evaluations of p.
A superior algorithm is obtained just by replacing min
with hd in the
above definition. This step is justified by the fact that the first satisfactory
value encountered is the smallest one existing in the given range. The new
version of find n
requires between 1 and evaluations of p, depending on the
a b.
precise values of and
It is possible to do even better if p is known to be a monotonic predicate.
We say p is monotonic if:
(p x = True /\ x < y) implies p y = True
x
for all and y. In other words, once p becomes true for a value it remains x
true for all values greater than x.
Let u s combine the idea of searching with a monotonic predicate with a
divide and conquer approach. Suppose we split the interval [a . . b],
where
a b,
< into two equal halves [a . . m] [m + b],
and 1 . . where = m (a + b) div 2.
a b,
The reader should check that if < then each of these intervals has a length
strictly smaller than #[a . . b]
. Now, either p = m True p m False
or = (since
p is assumed to be total). In the first case, the search for the smallest value
satisfying p can be confined to the interval [ a . . m].
Furthermore, if p is
monotonic and p = m False,
then there are novalues in the range [a . . m]
that satisfy p. This means that , in the second case, the subsequent search
can be confined to the interval 1..[m + b].
In either case, we can continue
searching in an interval of roughly half the size of the original.
158 EFFICIENCY
Exerc i ses
6.4.1 Suppose we define:
minimum = hd . isort
Show that ( minimum xs ) requires O (n) reduction steps, where n # xs .
Now consider defining minimum by:
minimum = hd . msort
6.4 DIVIDE AND CONQUER 159
insert x
follows:
foldr swap [ xl
swap x (y : ys) x : y : ys, if x ::; y
y : x : ys ,
=
= ot herwise
Prove by structural induction that :
us = take ( n div 2) xs
vs drop ( n div 2) xs
6.4.5 Consider the function rank defined by:
Using the functions words , lines and unparse designed in Section 4.3, and a
suitably generalised sorting function, design a function kwie that solves the
problem. Modify your solution so that kwie takes an extra argument - a
list ws of "inconsequential" words ( such as "and" , "of" , and "the" ) - and
produces only that sublist of rotated titles whose first word is not in ws.
6.4.7 Using a divide and conquer approach, show that the maximum and
minimum values in a list of n elements can together be determined in no
more than 3n/2 comparison operations.
6.4.8 The product of two complex numbers a + ib and e + id is given by
(ae bd) + i(ad + be) . This involves four ordinary multiplications. Find
-
find p a = hd [x I x � [a . . J ; p x J
8 •
7 •
6 •
5 •
4 •
3 •
2 •
1 •
1 2 3 4 5 6 7 8
A moment 's thought reveals that each column ( and each row ) must con-
. tain exactly one queen, so one way to find a solution is as follows. Place a
queen in the first column: any position will do. Then place a queen in the
second column: any position not held in check by the first queen will do.
Then place a queen in the third column: any position not held in check by
the first two queens will do. Continue in this way until alleight queens have
162 EFFICIENCY
[4, 6, 1, 5, 2, 8, 3, 7]
represents the solution in Figure 6.1. We will also write (i ,j ) to stand for a
queen located in column i and row j . Thus, the list above represents queens
at coordinates (1, 4), (2, 6), (3, 1), and so on.
To extend a placement p by adding a queen in row n we simply write
p * [n] j this places the new queen in column (#p + 1). This extension is safe
if no queen in the placement p puts the new queen in check:
queens 0 = [[ II
queens (m + 1) [p * [n] 1 p - queens m j n - [1 . . 8]j safe p n]
In words, there is exactly one placement of no queens , represented by the
empty list . Each placement of ( m + 1) queens consists of a placement of p
m n 1 8,
queens, plus a new queen placed in some row from to such that the
new queen is safe from check by any queen in p.
If we want to find just one
solution to the Eight Queens problem, then
the first will do as well as any other:
? hd (queens 8)
[ 1, 5, 8, 6, 3, 7, 2, 4]
If we prefer to findall solutions , then just typing queens 8 does the trick.
As it happens, there is a total of 92 solutions. Because of the way queens is
6.5 SEARCH AND ENUMERATION 163
written, these solutions are produced in lexical order: all solutions beginning
with 1 are printed before all solutions beginning with and so on. 2,
Note that the concept of 'backtracking' has disappeared from the final
program. Instead of t hinking in a dynamic way of generating one solution
and then another, we think in a static way of generating a list of all possible
solutions. The name 'list of successes' has been coined to describe this tech
nique. Because outermost reduction is used, only those parts of the lists that
are actually needed will be computed. Thus, printing just the first solution
requires less work than printing all the solutions. ( For a particular run on
an actual system, computing the first solution hd (queens 8)
required only 6
per cent of the time to compute all solutions queens 8.)
6.5.2 Search order
We now consider two ways of changing the order in which solutions are
searched: switching the order of generators in the list comprehension, and
permuting the order in which solutions are enumerated. The first dramati
cally increases the time required to find the first solution, while the second
may dramatically reduce it . In both cases, the time to find all solutions
remains virtually unchanged.
Generator o rder . An obvious variation of the previous program is:
sneeuq 0 = [[ ll
sneeuq ( m + 1) = [p * [n] 1 n [1 . . 8]; p
+- +- pS i safe n]
ps = sneeuq m
P
where
This reverses the order of the two generators in the list comprehension. The
call of sneeuq
m has been brought out into a where clause, since otherwise it
would be recomputed once for each value of n.
Clearly, sneeuq
finds the same solutions as queens ,
but it finds them in a
different order. The first solution it finds is:
? sneeuq 8
[4, 2, 7, 3, 6, 8, 5, 1]
This is exactly the reverse of the first solution found by queens.
It is not
hard to see why: whereas queens
first considers all solutions beginning with
1, 2,
then beginning with and so on, sneeuqfirst considers allsolutions ending
1, 2,
with then ending with and so on. Symmetry guarantees that the reverse
of every solution is itself a valid solution. In general, we have that:
and hence require essentially the same amount of time. (This is confirmed
by a run on an actual system, which shows that the two times are within 3
per cent of each other.)
On the other hand, if we want to find only the first solution, then it is not
immediately clear what their relative speeds will be. Since cleverness fails
us, let us try a run on an actual system 10 and behold, it turns out that
-
where ns = [4 . . 8] * [1 . . 3]
6.5 SEARCH AND ENUMERATION 165
A run confirms that this finds the first solution in only 14% of the time
required by queens.
It was possible to find the first solution faster simply because the part of
the solution space enumerated first (that is, solutions with the first queen in
column 4) contains more solutions. Moral: if one is searching for a needle in
a haystack, look in the part of the haystack that contains more needles.
Of course, this does no good if one wants to find all the needles in the
haystack, and indeed, queens4
is no faster than queens
if one wants to find
all solutions to the Eight Queens problem.
Once a cube is placed in a pile, only its sides (front , right, back, left) will
be visible:
visible [u,/, r , b, l , d] = [J , r , b , l]
166 EFFICIENCY
Two cubes are compatible if they have different colours on every visible side:
compatible c c' and [x =f:. x' I ( x , x' ) +- zip ( visible c, visible c')]
=
Exerci ses
6 . 5 . 1 Because no two queens may appear in the same row, returns safe p n
n
false if appears in the list p.
Therefore, it is safe to draw from the n
[1 . . 8]
list with elements appearing in p removed. Modify the definitions of
queens, safe, and check
to take advantage of this observation.
6 . 5 . 2 Could the modification suggested by the previous exercise also be ap
plied to sneeuq?
6.5.3 Every solution to the Eight Queens problem remains a solution after
reflection through a vertical, horizontal, or diagonal line through the middle
of the boardj or after the board is rotated about its middle by 90 degrees.
p
Write functions to convert a placement into the placements corresponding
to each of these transformations .
6.5 SEARCH AND ENUMERATION 167
Use reflection through a horizontal line to write a program that finds
6 . 5 .4
allsolutions to the Eight Queens problem twice as quickly as queens.
Would
it be as easy to use reflection through a vertical line in this way?
6.5.5 Use backtracking to write a program that finds all solutions to the
Eight Queens problem that are symmetric around a diagonal.
6.5.6 The following program computes the length of (queens i) for each i
from 0 to 8:
[#(queens i ) I i +- [0 . . 8]]
This program is wasteful, because the computation of, say, (queens8)
will re
compute (queens7),
which has already been computed. Write a more efficient
program to compute the same result .
6 . 5 . 7 Why might one suspect that there would be more solutions to the
Eight Queens problem that have the first queen in row 4 than have the first
queen in row I?
We can further explore the advantages of different search orders, using
6 . 5 .8
a function rowqueens nss
that takes a list of lists specifying for each column
the order in which its rows should be tried. For example, the search order
used by queens4
is duplicated by:
F O UR
+ F I VE
N I NE
The problem is solved by giving a mapping of letters to digits such that the
result is a valid statement of arithmetic. For example, the above problem is
solved by the mapping:
E -+ 3j F -+ 2j I -+ 4j N -+ 5j 0 -+ 9j R -+ OJ U -+ 7j V -+ 8
Infinit e List s
Lists can be infinite as well as finite. In this chapter we shall describe some
basic operations on infinite lists, show how they can be characterised as
the limit of a sequence of approximations, and develop proof methods for
reasoning about their properties, including a way to extend the principle of
structural induction. We shall also give some illustrative applications. In
particular, infinite lists can be used to model sequences of events ordered
in time, so they provide a suitable framework in which to study interactive
processes.
7. 1 Infinit e lists
take n [1 . . ] = [1 . . n]
[m . . ] ! n = m + n
map factorial [1 . . ] = scan ( x ) 1 [1 . . ]
169
170 INFINITE LISTS
powertable = [[m - n 1 m +- [1 . . ]] 1 n +- [2 . . ]]
{x2 1 x E { 1, 2, 3, 4, . . . }; x 2 < 1 O }
to stand for the set of all squares that are less than 10 , and this expression
denotes the finite set { 1 , 4, 9} . However, if we type the corresponding list
comprehension in a session then we get:
? [x - 2 1 x +- [1 . . ] ; x - 2 < 10]
[1, 4, 9
What has happened here is that the computer finds the first three elements
and then goes into an infinite loop searching for some element in the infinite
list [4, 5, 6, . .] whose square is less than 10. Although it is reasonable to
.
capable of conducting proofs, however trivial they might be. This does not
mean that the behaviour of the computer is 'unmathematical', only that set
theory is not the right theory for describing computations. A suitable theory
will be presented later, and we shall see that we can give a precise value
to the above expression without lapsing into informal explanations like 'and
then it goes into an infinite loop'. In particular, the above expression has the
precise value 1 : 4 : 9 : 1..
Incidentally, it is not difficult to modify the list comprehension so that it
does return a proper list of all squares less than 10. Using the rules described
in Chapter 3, we can can convert the comprehension into the equivalent form:
Exercises
7.1.1 Write a program that prints the infinite text:
7.2 Iterate
power / 0 = id
power / (n + 1) = / · power / n
Observe that r is similar in form to x n , which denotes a number multiplied
by itself n times, but one should be careful not to confuse the two.
The function iterate is defined informally as follows:
Thus iterate takes a function and a starting value and returns an infinite list.
For example:
iterate (+1) 1 = [1 , 2, 3, 4, 5, . . . ]
iterate ( x 2) 1 = [1 , 2, 4, 8, 16, 32, . . .]
iterate (divlO) 2718 = [2718, 271, 27, 2, 0, 0, . . .]
We also have:
[m . . ] = iterate (+l) m
[m . . n] = takewhile (� n ) ( iterate (+1) m)
For example:
digits 2718
= ( reverse · map (mod10) · takewhile (i 0» [2718, 271 , 27, 2, 0, 0, . . . ]
= ( reverse · map (modlO» [2718, 271, 27, 2]
= reverse [8, 1 , 7, 2)
= [2, 7, 1 , 8]
Next , consider the function (group n ) which breaks a list into segments
of length n. We have:
If the original list does not have a length which is evenly divisible by n, then
the last segment will have length strictly less than n.
7.2 ITERATE 173
As the last two examples suggest, one often finds map , takewhile, and
iterate composed together in sequence. Suppose we capture this pattern of
computation as a generic function, un/old say, defined as follows:
un/old h p t = map h . takewhile p . iterate t
The key feature about un/old is that it is a general function for producing
lists. Moreover, the functions h, t, and p correspond to simple operations on
lists. We have:
hd ( un/old h p t x ) = h x
tl ( un/old h p t x ) = un/old h p t ( t x )
nonnull (un/old h p t x ) = p x
Thus, the first argument h of un/old is a function which corresponds to hd,
the third function, t, corresponds to tl , and the second function, p, to a
predicate which tests whether the list is empty or not.
A formal, recursive definition of iterate can be given in two ways. The first
is a straightforward translation of the informal definition given above:
iterate / x = [power / i x I i +- [0 . ll .
and so on. Here, each element of the result list is computed by applying /
once to the previous element , and so the first n elements of (iterate / x ) can
be computed in O (n) steps. The function iterate is useful largely because it
can be computed in this efficient way. Incidentally, notice that iterate and
scan are similar, in that both compute each element of the output list in
terms of the preceding element .
174 INFINITE LISTS
Exercises
7.2.1 What is the value of:
map (3x )[0 . . ] = iterate (+3) 0
when '= ' means denotational equality? What is its value when '= ' means
computable equality?
7.2.2 Use iterate to write an expression equivalent to [a, b . . c] .
7.2.3 Define a function showint :: num -+ [char] that, given an integer,
returns the string that denotes its value. For example, showint 42 = "42" .
(Don't use show , that would be cheating.)
7.2.4 Define the function getint : : [char] -+ num that is the inverse of
showint . For example, getint "42" = 42. (This doesn't use infinite lists.)
In each line a bar has been drawn under every pth position, where p is the
prime number that begins the line. One can think of these bars as forming
an infinite 'sieve', which sifts the list of all numbers (poured into the top)
until only primes are left. For this reason the method is usually called 'The
Sieve of Eratosthenes'.
7.3 EXAMPLE: GENERATING PRIMES 175
Similarly, we can obtain all the primes less than 100 by evaluating:
This is certainly no more efficient than the original version, but it serves as
the starting point for a little massage. First , we have:
Now comes the final, but crucial step. The second term in this expression is
just another instance of rsieve, so the expression as a whole is equal to:
What is more, we can use this equation as a new definition of rsieve: both
definitions produce a well-defined list , and the argument above shows that
they must produce the same list . Starting with one definition of rsieve we
have therefore synthesised another. Observe also that the synthesis proceeded
by a chain of equational reasoning using only the definitions of the functions
appearing in the definition. The second definition uses recursion explicitly,
while the former does not . On the other hand, the new definition does not
involve the creation of an infinite number of infinite lists , and is therefore
slightly more efficient .
Exercises
7.3. 1 Write a program to find the first prime number greater than 1000.
7.3.2 Consider the two functions:
sieve (p : xs) = [x I x +- XS j x mod p '" 0]
sieve' (p : xs) xs - - iterate (+p) 0
Do they compute the same result? Compare the efficiency of the program for
generating infinite lists of primes using the two definitions of sieve. ( Hint: If
xs is a list that contains no multiples of p, how many steps does it take to
compute the first n elements of sieve (p : xs) and sieve' (p : xs)?)
It has already been mentioned that some care is needed when dealing with
infinite lists. In particular, we noted that the expression:
does not have the value [1 , 4, 9] , as one might expect , but rather the value
1 : 4 : 9 : 1.. In order to understand more clearly why this is so, a theory of
7.4 INFINITE LISTS AS LIMITS 177
infinite lists is required, and that is what this section will provide, at least
informally.
In mathematics, one way of dealing with infinite objects is by limits.
An infinite object may be defined to be the limit of an infinite sequence of
approximations. For example, the transcendental number 1[' :
1[' = 3.14159265358979323846 · . . ,
is an infinite object in this sense. It can be thought of as the limit of the
infinite sequence of approximations:
3
3.1
3.14
3.141
3.1415
Again, the sequence consists of better and better approximations to the in
tended limit . The first term, .1 , is the undefined element , and thus a very
crude approximation: it tells us nothing about the intended limit. The next
term, 1 : .1 is a slightly better approximation: it tells us that the intended
limit is a list whose first element is 1, but says nothing about the rest of the
list . The following term, 1 : 2 : .1, is a little better still, and so on. Each suc
cessively better approximation is derived by replacing .1 with a more defined
value, and thus gives more information about the intended limit.
Any list ending in bottom, i.e. any list of the form XI : X2 : : Xn : .1, will
• • •
be called a partial li8t. Every infinite list is the limit of an infinite sequence
of paxtial lists. Thus, we have three kinds of lists: finite lists, which end in
[ ] ( such as 1 : 2 : 3 : [ ], also written [1, 2, 3] ) ; partial lists, which end in .1 ,
( such as 1 : 2 : 3 : .i); and infinite lists, which d o not end at all ( such as
[1, 2, 3, . . .] ) .
Now i t turns out that if :CS 1 , :1:8 2 , :1:83 ,
• • • i s an infinite sequence whose
limit is :1:8, and f is a computable function, then f :l:8t , f :1:8 2 , f :1:83, • • •is
178 INFINITE LISTS
The limit of this sequence is the infinite list [2, 4, 6, 8, . . .J of positive even
integers, just' as we would expect.
A short word about applying functions to partial lists is in order here.
No equation defining map matches map ( x 2) ..L because ..L does not match
either [ j or ( x : xs ) . Thus map ( x 2) ..L = ..L and we say this follows from case
exhaustion on map . Thus we have:
map ( x 2) ( 1 : ..L) = ( 1 X 2) : (map ( x 2) ..L) (map.2)
= 2 : ..L (map .O)
where we write (map.O) to indicate case exhaustion on map. The res � of the
sequence above is evaluated similarly.
As a second example, filter even [1 . . ] can be computed as follows:
filter even ..L = ..L
filter even ( 1 : ..L) = ..L
filter even ( 1 : 2 : ..L) = 2 : ..L
filter even ( 1 : 2 : 3 : ..L) = 2 : ..L
filter even ( 1 : 2 : 3 : 4 : ..L) = 2 : 4 : ..L
Again, the limit is the infinite list [2, 4, 6, . . . j , as expected. This gives a
(simple ) example of how an infinite list may be the limit of two different
sequences.
Now consider the expression:
Recall that the principle of induction over (finite) lists is as follows. To prove
by induction that P(xs) holds for every finite list xs one must show two
things:
This i s valid because every finite list has the form Xl : X2 : : Xn : [ ] , and
• • •
Xn : .1.., and so is built from ( : ) and .1... Thus, the same induction principle
works if we ignore the case for [ ] above, and show instead:
, In fact, one can 'mix and match' cases as convenient: one can prove P( xs)
for any finite list by showing case [ ] and case (x : X8 ) , for any partial list by
showing ease l. and case (x : xs), and for any finite or partial list by showing
case .1.., case [ ] , and case (x : X8 ) .
For example, here is a property that is true of partial lists but not of
finite lists. Observe that :
( 1 : 2 : .1..) * [3, 4] = 1 : «2 : .1..) * [3, 4] ) ( * .2)
= 1 : 2 : (.1..* [3, 4]) (* .2)
= 1 : 2 ; .I.. (* .0)
Proof. The proof is by induction on xs. Since xs is a partial list, the cases
are .I.. and (x : xs ) .
7.5 REASONING ABOUT INFINITE LISTS 181
Case . L We have:
for every list xs , ys, and zs. The proof was by induction on xs, using the two
cases [ ] and (x : xs ) , and so was valid only for finite list xs. However, we also
have:
182 INFINITE LISTS
which establishes the case for .1, and so the law holds for infinite lists xs as
well. The original proof used no properties of ys or zs, and so is valid for
infinite lists ys and zs also.
Of course, not alllaws extend to infinite lists. For example, we previously
proved that:
reverse (reverse xs) = xs
for all finite lists xs. Indeed, we even have that :
reverse (reverse .i) = .1
which seems to extend the law to infinite lists xs . However, a quick glance
at the proof shows that it uses the auxiliarly result :
reverse (ys * [xD = x : reverse ys
for every x and finite list ys. We have:
reverse (.1 * [x D = reverse .1 = .1 t x : .1
and so the auxiliary result does not hold for infinite lists ys, and hence the
main result does not hold for infinite lists xs . It is left as an exercise for
the reader to show that reverse xs = .1 for any infinite list xs, and hence
reverse (reverse xs) = .1 for any infinite list xs.
As a final word of caution, observe that there do exist predicates that are not
chain complete. For example, if P(xs) is the predicate 'xs is a partial list ',
and YSo, YS1 ' YS 2 " . . is a sequence of partial lists whose limit is the infinite
list ys (for example, the sequence .1 , 1 : .1 , 1 : 2 : .1 , . . . whose limit is
[1, 2, 3, . . .D , then P(yso), P(YS1 ) , P(YS 2 ) , ' " will all be true but P(ys) will
be false. For such predicates, induction can still be used to prove that the
predicate holds for partial lists , but the proof will not necessarily extend to
infinite lists.
take n xs = take n ys
for all natural numbers n. We shall refer to this fact as the take-lemma.
In one direction the proof of the take-lemma is obvious: if xs = ys, then
certainly take n xs = take n ys for any number n. To justify the result in the
reverse direction, recall the definition of take:
take 0 xs = []
take ( n + 1) [ ] = []
take ( n + 1) ( x : xs) = x : take n xs
for allpositive n.
take n ys = take n xs = 1.
The proof of the take-lemma may seem rather elaborate, for the result cer
tainly appears intuitively obvious. However, intuition is not always a reliable
guide in reasoning about non-finite lists. For example, the proposition that
two lists xs and ys are equal just in the case that :
xs ! n = ys ! n
Case n + 1.
take ( n + 1) ( iterate / (f x ))
= take (n + 1 ) (/ x : iterate / (f (f x ))) ( iterate. 1 )
= / x : take n (iterate / (f (f x ))) ( take.3)
= / x : take n ( map / ( iterate / (f x ))) (hypothesis)
= take (n + 1) (/ x : map / (iterate / (f x ))) ( take.3)
= take (n + 1) (map / ( x : iterate / (f x ))) (map .2)
= take (n + 1) (map / (iterate / x )) ( iterate. 1)
as required. 0
We shall give one further example of the take-lemma. Consider the infinite
list nats defined by:
take n nats = [0 . . n - 1]
7.5 REASONING ABOUT INFINITE LISTS 185
for all natural numbers n. To do this, we need the useful subsidiary result
that:
take n · map ! = map ! ' take n
for all ! and n. In other words, take n commutes with map ! . The proof is
left as an exercise. The induction step for our assertion is:
take (n + 1) nats
= take ( n + 1) (0 : map (+ 1) nats) ( nats.1)
0 : take n (map (+1) nats) ( take .3)
= 0 : map ( + 1) ( take n nats) (map, take comm. )
= 0 : map (+1) [0 . . n - 1] ( hypothesis )
= 0 : [1 . . n]
= [0 . . n]
as required.
Exercises
7.5.1 Prove that:
iterate (+ a) b = [(i X a) + b I i +- [0 . . ]]
7.5.2 Define:
loop1 = loop1
loop2 = loop1 : loop2
loop3 = tl (1 : loop3)
What are the values of loop1 , loop2 , and loop a? Give the infinite sequences
that have these values as their limit.
7.5.3 Prove that #xs = ..L if xs is partial or infinite.
7.5.4 In Chapter 5 a proof was given that :
take n xs * drop n xs = xs
for all finite lists xs. Extend the proof to cover infinite lists xs.
7 . S . S Why does the proof of the second duality theorem in Chapter 5 not
extend to infinite lists?
7 . S .6 Prove that drop n xs is a partial list whenever xs is a partial list. From
this result, can we conclude that drop n xs is infinite whenever xs is infinite?
7.5.7 Suppose we define:
jibs = 0 : 1 : [x + y I (x, y) +- zip (jibs , tl jibs)]
Prove that jibs = map jib [0 . . ] , where jib is the Fibonacci function. State
carefully any subsidiary results you use about the relationship of take with
zip.
186 INFINITE LISTS
ones = 1 : ones
We have:
ones = 1 : ones
= l : l : ones
= 1 : 1 : 1 : ones
So in this case the entire infinite list may be represented within a fixed amount
of space!
As a second example, consider the definition:
:
' M ' : ' 0 ' 'r ' : Ie' :',: " a' : In'
,
: 'd ' : ' : 'm ' : ' 0 ' : ' r ' : Ie' :',:1
which again involves a cycle.
We now consider three further examples of the use of cyclic structures.
7.6. 1 Forever
Let forever be a function such that forever x is the infinite list [x , x, x, . . . ],
so the definition of ones above is equivalent to:
ones = forever 1
7.6 CYCLIC STRUCTURES 187
One way to define fOre'lJer is:
forever z = z : forever z
This definition is correct , but does not create a cyclic structure. If ones and
forever are defined as above, then after printing the first five elements, ones
will be represented by the graph:
1 : 1 : 1 : 1 : 1 : forever 1
which is not cyclic. If the next element of the list is printed, the subterm
forever 1 will be replaced by 1 : forever 1, and so the list of ones may grow
longer without end. On the other hand, if the definition of forever is changed
to:
forever z = Z8
where Z8 = z : zs
then the definition of ones in terms of forever will produce the same cyclic
structure as before.
7.6.2 Iterate
Here is a new definition of the function itemte , this time using a cyclic struc
ture:
itemte f z = Z8
where zs = z : map f
zs
Consider the term itemte (2x) 1 . The first few steps of evaluating this term
are as follows:
itemte (2x) 1
We showed in the previous section that iterate satisfies this equation. The
new definition does not use cyclic lists, and turns out to be much less efficient
than the previous definition. Considering again the term iterate (2x ) 1, the
first few steps of evaluating this term are:
iterate (2 x ) 1
� 1 : map (2 x ) ( iterate (2x ) 1)
=> 1 : 2 : map (2x ) ( map (2 x ) ( iterate (2 x ) 1))
� 1 : 2 : 4 : map (2 x ) ( map (2 x ) (map (2 x ) (iterate (2 x ) 1)))
It can be seen that it now requiresO (n2) steps to compute the first n elements
of iterate I z . So in this example the use of cyclic structures is essential in
achieving efficiency.
(n2 )
This solution produces the correct answer, but does not form a cyclic struc
ture, and so requires 0 n
steps to compute the first elements of the result
list . It is left as an exercise to modify hamming' so that it does form a cyclic
structure.
Exercises
7.6.1 Draw the ( degenerate ) cyclic graphs that represent loop 1 , loop2 , and
loopS in Exercise 7.5.2.
190 INFINITE LISTS
and define a round of the game to be a pair of moves, one for each player. It
is convenient to introduce the synonyms:
move == [char]
round == (move , move)
( In the next chapter we shall see how moves can be denoted directly as values
of a new type, rather than represented indirectly as lists of characters. )
In order to score a round, we need first to define the function beats which
expresses the relative powers of the three objects. The definition is:
beats "Paper" = "Scissors"
beats "Rock" = "Paper"
beats "Scissors" = "Rock"
total scores = (sum ( map 1st scores) , sum (map snd scores))
192 INFINITE LISTS
In order to complete the model of the game, we must decide on how strate
gies are to be represented, and so supply the necessary definition of rounds.
There are at least two possible methods for representing strategies and it is
instructive to compare them in some detail. In the first , we take:
count z zs = # [y I y +- :l:S j X = y]
The function update appends a new pair of moves to the list of existing
rounds, and rounds generates the infinite list of rounds by repeatedly apply
ing update to the initially empty list . The definition is somewhat clumsy.
More importantly, it is not very efficient . Suppose a strategy takes time
proportional to the length of the input to compute its result. It follows
that update takes O (n) steps to update a game of n rounds by a new one.
Therefore, to compute a game of N rounds requires O (N 2 ) steps.
An alternative representation. For comparison, let us now consider
another way we might reasonably represent strategies. This time we take:
recip ms = "Paper" : ms
This strategy returns "Paper" the first time, and thereafter returns just the
move made by the opponent in the previous round. Observe that this version
of recip produces each successive output with constant delay.
The strategy smart can be reprogrammed as follows:
smart xs = "Rock" : map choose ( counts xs)
counts = tl · scan (ffi) (0, 0, 0)
where (p , r, s) ffi "Paper" = (p + 1, r, s )
(p , r , s) ffi "Rock" = (p , r + 1 , s )
(p, r, s) ffi "Scissors" = (p , r, s + 1)
The value ( counts xs) is the list of triples representing the running counts of
the three move values. The smart strategy is also efficient in that it produces
each successive output with constant delay.
With our new model of strategies we can redefine the function rounds in
the following way:
rounds (f , g) = zip (xs, ys)
where xs = f ys
ys = 9 xs
xs = f ys
ys = 9 xs
194 INFINITE LISTS
The first reply of cheat is the move guaranteed to beat the opponent 's first
move; similarly for subsequent moves. To see that cheat cannot be prevented
from subverting the game, consider a match in which it is played against
recip. Suppose:
xs = cheat ys
ys = recip u
These equations have solutions xs and ys which are the limits of the sequence
of approximations XSo , U l , and YS o, YS1 , " " respectively, where Uo =
• • .
YSo = .L and:
Un+1 = cheat ys"
YSn + 1 = recip u "
Now, we have:
XS l = cheat .L = .L
YS I = recip .L = "Paper" : .L
and so:
U2 = cheat ( "Paper" : .L ) = lOS cissors" : .L
YS 2 = recip .L = "Paper" : .L
and so:
U3 = cheat ( "Paper" : .L) = "Scissors" : .L
YS3 = recip ( "Scissors" : .L ) = "Paper" : "Scissors" : .L
Continuing in this way, we see that the limits of these sequences are indeed
well-defined and, moreover, cheat always triumphs.
Can we find a way to protect against such a strategy? To answer this
question, we need to take a closer look at what constitutes a fair strategy.
Informally speaking, f is fair if it can determine its first move in the absence
of any information about its opponent 's first move, its second move only on
the basis of the opponent 's first move (at most ) , and so on. (Of course, it
is not required that f take any account of the opponent's previous moves in
order to compute a reply. )
7.7 EXAMPLE: THE PAPER-ROCK-SCISSORS GAME 195
take 1 (f .1 ) = [ Yl ]
take 2 (f ( Xl : .1 )) = [ Yl . Y2 ]
take 3 (f ( Xl : X2 : .1 ) ) = [ Yl , Y2 , Y3 ]
and so on. In other words, we must have, for all infinite sequences of well
defined moves xs and for all n � 0, that:
prune 0 xs .1
prune ( n + 1 ) (x : xs) X : prune n xs
The recip strategy is fair in this sense because, using the fact that :
lair f xs = ys
where ys = I (synch ys xs)
synch (y : ys) (x : xs) = x : synch ys xs, if defined y
196 INFINITE LISTS
Here, defined returns True if its argument is not .1, and .1 otherwise. In
particular, we have
synch (J.. : ys) (x : xs) = J..
We leave as an instructive exercise for the reader the proof that (fair f xs)
returns an infinite list of well-defined moves if and only if f is a fair strategy.
It follows from the above analysis that to prevent cheating we must rewrite
the definition of rounds as follows:
rounds (f, g) = zip (xs, ys)
where xs = fair f ys
ys = fair 9 xs
Exercises
7.7.1 Prove that:
prune n :z:s = take n xs * .1
7.7.2 Give fair strategies in the paper-rock-scissors game that (i) never looks
at the opponent 's moves in calculating a reply; and (ii) looks only at every
third move. Define a strategy which is fair for ten moves, but then cheats.
7.7.3 Prove that if f is a fair strategy, then fair f xs = f xs. Show that if f
is not fair, then fair f xs is a partial list . What happens to the game in such
a case?
7.7.4 Bearing in mind that defined is applied to strings, give a definition of
defined.
So far, all of our sessions with the computer have involved a uniform and
simple pattern of interaction: the user types an expression to be evaluated at
the keyboard, and then the value of this expression is printed on the screen.
This style of interaction is suited for a wide range of purposes, but sometimes
other forms of interaction are required. As a trivial example, we might wish
that everything typed on the keyboard is 'echoed' on the screen, but with
lower-case letters converted to upper-case. A more entertaining example
would be a program to play the game of hangman. This section shows how
interactive programs can be written in our functional programming notation.
An interactive program will be represented by a function f of type:
f :: [char] � [char] .
When f is run interactively, the input to f will be the sequence of characters
typed at the keyboard, and the output of f will be the sequence of characters
typed on the computer screen.
7.8 INTERACTIVE PROGRAMS 197
For example, the program that capitalises its input may be written as
follows:
Hello, world!
HELLO , WORLD!
behaves in the same way as capitalises, but terminates execution when the
' I ' character is typed. Notice that the 'I' character will not be echoed on
the screen.
7.8.2 Hangman
A simplified version of the game of hangman may be played as follows. First,
one player enters a word. Then, the other player repeatedly guesses a letter.
For each guess, the computer (which acts as mediator between the players)
prints the word, with all of those letters not guessed so far replaced by a
'-'. When all the letters in the word have been guessed, the process repeats.
Thus, a typical session of hangman might begin like this:
Enter a word: - - - - - -
a -a- - -a-
n -an- -an
t -an- -an
m -an-man
h han-man
g hangman
Enter a word:
and now it is the other player's turn to enter a word. Notice that when a
word to be guessed is entered, it is not printed but a '-' appears on the screen
as each letter is typed on the keyboard.
7.S INTERACTIVE PROGRAMS 199
The following program will play hangman, when the function hangman
is run interactively:
hangman input
= "Enter a word: " * echo * "t" * game word [ ] input'
where
echo = ['-' I w - word]
word = before 't' input
input' = after 't' input
before z = takewhile ( � z )
after z = tl · dropwhile (� z )
In the above script, word is the word to be guessed, guess is the list of guesses
so far, and input is the list of characters typed at the keyboard. Note that:
and so before and after may be used to extract the word to be guessed and
the remaining input .
Observe also that :
and so t he prompt "Enter a word: " will be printed on the screen before any
input is typed, and each character of the word will be echoed on the screen
with a '- ' as it is typed.
200 INFINITE LISTS
Another useful utility function is read2 , which performs two read opera
tions in sequence. It is defined as follows:
read2 (msg1 , msg2) g = read msg1 g1
where
g1 line1 = read msg2 g2
where
g2 line2 = g ( linel , line2)
The term read2 (msgl , msg2) g denotes an interactive program that prints
the string msg1 , reads and echoes a line line1 from the keyboard, prints
the string msg2 , reads and echoes a line line2 from the keyboard, and then
behaves like the interactive program g ( linel , line2) .
The read function supports a particular style of interaction: input occurs
one line at a time and is always echoed. Utility functions to support other
styles of input can also be written, and some examples are given in the
exercises.
It is also convenient to introduce utility routines for output and for the
termination of an interactive session . These may be defined as follows :
forces the user to repeatedly guess a secret word (in this case, "sheep" ) and
terminates when the word has been correctly guessed.
Utility functions are important because they allow interactive programs
to be written at a higher level of abstraction. Notice that the guess program
does not refer directly to the list of characters from the keyboard or the
list of characters to the screen. All of the details of representing interactive
programs as functions of type [char] --4 [char] have been hidden inside the
implementation of the utility functions, and the users of such functions need
not be concerned with these details.
The following interactive program manipulates tables using only these three
operations, so any other implementation of tables may easily be substituted
for the above.
A simple interactive program for maintaining a table might behave as
follows. First, the program requests a command, which must be either "en
ter" or "lookup" or "end" . If the command is "enter" , then a new key and
value are requested and entered into the table. If the command is "lookup" ,
then a key is request and its associated value in the table is printed. If the
command is "end" , then a suitable message is printed and the interactive
session terminates.
The following program has the behaviour described above:
Exercises
7.8.1 Modify hangman to display a running total of the number of right and
wrong guesses in a game.
7.8.2 What are the types of the utility functions read, read2 , write , and
end? (Hint: The types will be more meaningful if one writes interactive for
the type [ char] -+ [char] .)
7.8.3 Write the utility function readS , which is like read2 except it prompts
for and reads three values.
7.8.4 Using showint and getint from Exercises 7.2.3 and 7.2.4, write utility
functions readint and writeint to use in interactive programs that deal with
integers.
7.8.5 When running the interactive program table, what happens if the com
mand is not "enter" , "lookup" , or "end"? Is this behaviour acceptable?
Modify table to be more robust .
7.8.6 The backspace character '+ ' when typed at the keyboard indicates
that the previously typed character should be deleted, and when printed on
the screen causes the cursor to be moved back one position. Write a utility
function readedit that is similar to read except that '+ ' can be used to edit
the input. For example, typing "goop+d" followed by a newline should cause
the string "good" to be entered. (For echoing, note that a character already
printed on the screen can be removed by printing "+u+" j this backs over the
character, writes a space in its place, and then moves the cursor back over
the space. )
C hapter 8
New Typ es
So far we have seen three basic types, num, bool and char, and three ways
of combining them to form new types. To recap , we can make use of (i) the
function space operator -+ to form the type of functions from one given type
to another; (ii) the tupling operator (a, . . . , f3) to form tuples of types; and
(iii) the list construction operator [a] to form lists of values of a given type.
As well as using these three operators, we can also construct new types
directly. How this is done is the subject of the present chapter. We shall
describe a simple mechanism for introducing new types of values, illustrate
how the mechanism is used in practice, and describe a little of its theoretical
foundations.
One simple way to define a new type is by explicit enumeration of its values.
For example, suppose we are interested in a problem which deals with the
days of the week. We can introduce a new type day by writing:
day ::= Sun I Mon I The I Wed I Thu I Fri I Sat
This is an example of a type definition. Notice the new sign : : = which serves to
distinguish a type definition from a synonym declaration or a value definition.
Notice also the vertical bars which separate what are called the alternatives
of the type. The effect of the definition is to bind the name day to a new type
which consists of eight distinct values, seven of which are represented by the
new constants Sun, Mon, . . . , Sat, and the eigthth by the ubiquitous ..L which
is assumed to be a value of every type. The seven new constants are called the
constructors of the type day. By convention, we shall distinguish constructor
names from other kinds by beginning them with a capital letter. Thus, if
an identifier begins with a capital letter, then it denotes a constructor; if it
begins with a small letter, then it is a variable, defined value, or type name.
Here are some simple examples of the use of the type day. Firstly, values
of type day can be printed and compared:
204
8.1 ENUMERATED TYPES 205
? Mon
Mon
? Mon = Mon
7rue
? Mon = JiH
False
However, a request to evaluate an expression such as Mon = " Moti' would
cause the computer to signal a type error: the left-hand value has type day,
while the right-hand value has type [char] .
The values of an enumerated type are ordered by the position at which
they appear in the enumeration. Hence we have:
? Mon < JiH
7rue
? Sat � Sun
False
Like any other kind of value, the constructors of type day can appear in
lists, tuples and function definitions. For example, we can define:
workday .. day -+ bool
workday d = (Mon � d) A ( d � JiH)
weekend day -+ bool
weekend d = (d = Sat) V ( d = Sun)
It is also possible to use the new constructors as patterns in definitions.
For example, the function dayval which converts days to numbers can be
defined by the seven equations:
dayval Sun = 0
dayval Mon = 1
dayval The = 2
dayval Wed = 3
dayval Thu = 4
dayval JiH = 5
dayval Sat = 6
An alternative definition of dayval is:
dayval d = hd [k I (k, z ) 4- zip ( [0 . . ] , days)j z = d]
where days = [Sun, Mon, The, Wed, Thu, JiH, Sat]
The inverse function valday can be defined in a similar fashion. We can use
dayval, together with valday , to define a function dayafter which returns the
day after a given day:
dayafter d = valday « dayval d + 1) mod 7)
206 NEW TYPES
As well as defining a type by listing the names of its values, we can also define
types whose values depend on those of other types. For example:
tag ::= Tagn num I Tagb 0001
8.2 COMPOSITE TYPES 207
This defines a type tag whose values are denoted by expressions of the form
( Tagn n), where n is an arbitrary number, and ( Tagb b), where b is an
arbitrary boolean value. There are also three additional values of tag: the
value 1.., which is an element of every type, and the values ( Tagn 1..) and
( Tagb 1..) . We shall consider these further below.
The names Tagn and Tagb introduce two constructors for building values
of type tag. Each constructor denotes a function. The types of these functions
are:
Tagn : : num --+ tag
Tagb :: bool --+ tag
There are two key properties that distinguish constructors such as Tagn
and Tagb from other functions. First of all, an expression such as ( Tagn 3)
cannot be further simplified and is therefore a canonical expression. In other
words, there are no definitions associated with constructor functions. Instead,
constructors are regarded as primitive conversion functions whose role is just
to change values of one type into another. In the present example, Tagn and
Tagb take numbers and booleans and turn them into values of type tag.
The second property which makes Tagn and Tagb different from ordinary
functions is that expressions of the form ( Tagn n) and ( Tagb b) can appear as
patterns on the left-hand side of definitions. For example, we can recover the
elements of the component types by defining the following 'selector' functions:
numval .. tag --+ num
numval ( Tagn n) = n
boolval .. tag --+ bool
boolval ( Tagb b) = b
U sing pattern matching we can also define functions for discriminating be
tween the alternatives of the type. For example, consider the predicate
isNum, defined by:
isNum ( Tagn n) = True
isNum ( Tagb b) = False
This function has the property that:
isNum 1.. = 1..
isNum ( Tagn 1..) = 71rue
isNum ( Tagb 1.) = False
It follows that the three values 1.., Tagn 1.., and Tagb 1.. are all distinct , as we
claimed earlier. The predicate isBool can be defined in a similar manner.
The values of tag are also ordered under the generic relation < . We have:
Tagn n < Tagn m = n < m
Tagn n < Tagb b = 71rue
Tagb b < Tagn n = False
Tagb b < Tagb c = b < c
208 NEW TYPES
The second and third equations follow from the relative position of the alter
natives in the definition of tag .
Further examples. The ri ght-hand side of the definition of tag contains
two alternatives and involves two distinct component types, num and bool.
In general, a type definition can contain one or more alternatives and zero
or more component types. For example, the type:
temp ::= Celsius num I Fahrenheit num I Kelvin num
contains three alternatives yet only one component type, namely num . The
type temp might be used to provide for multiple representations of temper
atures. However, it is a fundamental property of all data types that distinct
canonical expressions represent distinct values of the type, so we have:
? Celsius 0 = Fahrenheit 32
False
In order to compare temperatures in the expected way, we have to define an
explicit equality test :
eqtemp : : temp -+ temp -+ bool
and use eqtemp instead of the built-in test (=). The appropriate definition
of eqtemp is left as an exercise.
Another method for describing essentially the same information about
temperatures is to define:
temp == ( tag , num )
tag . . - Celsius I Fahrenheit I Kelvin
Here, temp is not a new type but merely a synonym for a pair of values. Yet
a third way is to write
temp Temp tag num
tag Celsius I Fahrenheit I Kelvin
Here, temp is a new type. Note that there is only one alternative in the
definition of temp . Such types are called 'non-union' types in contrast to
'union' types which, by definition, contain at least two alternatives.
Here is another example of a type containing only one alternative:
file ::= File [record]
Values of type file take the form (File rs) , where rs is a list of records . Since
a file is not a list , we have to provide explicit functions for processing files.
For example, to delete a record from a file, we can define a function:
delete . . record -+ file -+ file
delete r (File rs) = File ( rs - - [r] )
8.2 COMPOSITE TYPES 209
(i ) x = Y = .l j or
Exercises
8.2.2 Suggest a way of representing the type (a , (3) by lists of length two.
Generalise for arbitrary tuple-types.
8.2.3 Suppose a company keeps records of its personnel. Each record con
tains information about the person's name, sex, date of birth and date of
appointment. Define a function that takes the current date and the file of
personnel records and returns a list of names of those people who have served
the company for twenty years or more. Supposing that employees hired be
fore 1980 retire in the year in which they reach sixty-five, while employees
hired after 1980 retire at sixty, define a function that calculates for a named
employee the number of years before they retire. How would you organise
the definitions so that they do not depend on the details of how personnel
records are represented?
8.2.4 Suppose records have numeric keys. Define a function for merging two
files whose records are in increasing order of key value.
8. 2.5 Define functions for (i ) testing the equality of two temperaturesj and
(ii) testing whether one temperature is colder than another.
Type definitions can also be recursivej in fact, much of the power of the type
mechanism comes from the ability to define recursive types. To see what is
involved, we consider three examples: natural numbers, lists and arithmetic
expressions.
8.3 RECURSIVE TYPES 211
This definition says that Zero is a value of nat, and that (Succ n) is a value
of nat whenever n is. For example, each of:
Zero
Succ Zero
Succ (Succ Zero)
are values of nat. In nat, the number 7 would be represented by the value:
Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero))))))
forms of pattern matching permitted with num. The pattern ( n + 1), of type
num, corresponds to the pattern (Succ n) of type nat and so n itself must
match a natural number.
As with all type definitions, the ordering < on nat is determined by the
relative positions of the constructors in the declaration. Since Zero precedes
Succ we have:
Zero < Zero = False
Zero < Succ n = True
Succ m < Zero = False
Succ m < Succ n = m < n
This ordering corresponds to the normal interpretation of < on natural num
bers.
Now let us return to the point about there being 'extra' values in nat.
The values:
1., Succ 1., Succ ( Succ 1.) , . . .
are all different and each is an element of nat. To see that they are different ,
consider the relation � defined by:
O � 1. 1.
o � (Succ 1.) = True
we have Succ 1. # 1.. In other words, Succ is not a strict function. More
generally, we have for m < n that:
m � (Succm 1.) = 1.
m � (Succn 1.) True
where fn means f iterated n times, and so Succm 1. # Succn 1. .
There i s also one further value of nat , namely the 'infinite' number:
If we define:
inf = Succ inf
then inf denotes this value of nat. It is left as an exercise to show that inf
is distinct from any other given element of nat.
To summarise what we have learnt about nat, we can divide the values of
nat into three classes: ( i ) the finite numbers which correspond to well-defined
8.3 RECURSIVE TYPES 213
natural numbers; (ii) the partial numbers, .1., ( Succ .1.), and so on; and (iii) a
single infinite number. This last value represents, in a sense that can be made
quite precise, the unique 'limit' of the partial numbers. We have encountered
a similar classification for the case of lists in the previous chapter and we
shall see that it holds true of all recUrsive types. There will be the finite
elements of the type; the partial elements, and the infinite elements . As in
the case of lists, there will in general be more than one infinite element, but
each such value will be the limit of the sequence of partial elements which
approximate it .
This definition says that Nil is a value of ( list a ) , and ( Cons x xs) is a value of
( list a ) whenever x is a value of a and xs is a value of ( list a ) . For example:
.1
Nil
Cons .1 Nil
Cons 1 .1
Cons 1 ( Cons 2 Nil)
.1
Cons 1 .1
Cons 1 ( Cons 2 .1)
Cons 1 ( Cons 2 ( Cons 3 .1))
Each term in the above sequence denotes a different value of (list num).
Moreover, each term is obtained from its predecessor by replacing .1 by an
expression of the form ( Cons n .1). Each term is therefore 'more defined'
than its predecessor, and is an approximation to the infinite list :
The type list num contains an infinite number of infinite lists; nevertheless,
each is the limit of the sequence of its partial approximations.
Structural induction. The principle of structural induction for type
( list a ) is as follows. In order to show that a property P( xs) holds for all
finite elements xs of type (list a ) it is necessary to show that :
We have already seen many examples oft he corresponding principle for ordi
nary lists, so we shall give no new ones here. In order to show that P holds
for partial lists as well, we have to add an extra case and show that
(i) e is a number; or
(ii) e is an expression of the form et EEl e2 , where el and e2 are arithmetic
expressions, and EEl is one of + , - , X or f .
The type aexp has two constructors, Num and Exp, corresponding to the
two clauses of the above description. The type aexp is non-linear because
aexp appears in two places in the second alternative of the definition. The
enumerated type aop consists of four constructors, one for each arithmetic
operation.
216 NEW TYPES
Here are some simple examples of elements of aexp. The expression (3+4)
is represented in aexp by the value:
apply Add = (+ )
apply Sub = ( ) �
apply Mul = ( x )
apply Div = (/)
Exercises
8.3.1 Suppose inf = Succ info Show that inf is distinct from any other
element of nat .
8.3.2 Define multiplication and exponentiation as operations on nat .
8.3.3 Devise a representation of integers by a type int in which each integer
is represented uniquely.
8.3.4 Devise a function unparse :: aexp � [char] that prints an arithmetic
expression; for example:
8.3.6 The function unparse produces the conventional form of writing arith
metic expressions in which binary operators appear between their arguments.
We can also write expressions with operators coming before or after their ar
guments. For example; the expression 3 + 4 X 5 can be written as:
+3 x 4 5 (preorder listing )
3 4 5 X + (postorder listing)
Define functions preorder, and postorder , each having the type aexp � [val] ,
where:
val : : = Nval num I Oval aop
8.4 ABSTRACT TYPES 221
8.3. T Define the function run using foldl rather than an explicit recursion.
Using the law
deduce that
abstr :: B -+ A
•
set set
addset x
- 1
list
insert x
•
t-
list
8.4.4 Queues
We shall now illustrate the ideas introduced above by considering some ex
amples of abstract types, starting with queues. Informally, a queue is a list of
values which are processed in a special way. Suppose ( queue a) denotes some
collection of values of type a on which the following operations are defined:
start .. queue a
join .. queue a --+ a --+ queue a
front .. queue a --+ a
reduce .. queue a --+ queue a
The informal interpretation of these operations is as follows: start generates,
or 'starts', an empty queuej (join q x) returns a queue which is identical to
q except that x has 'joined' as a new last memberj (front q) is the value at
the 'front' of the ( non-empty ) queue q; and (reduce q) is q 'reduced' by the
removal of the front element.
We can express the intended relationships between the various operations
by the following equations:
front (join start x ) = x
front (join (join q x) y ) = front (join q x )
reduce (join start x ) = start
reduce (join (join q x) y) = join ( reduce (join q x)) y
These equations constitute what is calle d an algebraic specification of the
abstract type ( queue a) . Although they look like formal definitions of front
and reduce in terms of a data type based on the constructors start and join,
they should be understood only as expressing certain algebraic laws that the
four operations must satisfy. On the other hand, it is certainly possible to
base the representation of queues on the type:
repqueue a : := Start I Join (repqueue a) a
226 NEW TYPES
8.4.5 llrrays
A second example of an abstract type is provided by arrays. Basically, an
array is a finite list whose elements may be modified and inspected, but the
number of elements in the list does not change. Suppose ( array a) denotes
the abstract type defined by the following four operations:
mkarray .. [a] -t array a
length .. array a -t num
lookup .. array a -t num -t a
update .. array a -t num -t a -t array a
Informally, (mkarray xs) creates an array from a given list xs; the value
( length A ) is the number of elements in the array; ( lookup A k ) returns the
kth element of A, counting from 0; and (update A k x) returns an array that
is identical to A except that its kth element is now x .
We can express the relationships between the operations by the following
equations:
length (mkarray xs) = #xs
length (update A k x ) = length A
lookup ( mkarray xs) k = xs ! k
lookup ( update A j x) k = x, if j = k
= lookup A k, ot herwise
One way to represent arrays is by a pair of values, a list and a number
giving the length of the list. Suppose we set :
reparray a == ([a] , num)
8.4 ABSTRACT TYPES 227
and define:
valid reparmy 0: -t bool
valid (xs, n) (#xs = n)
abstr reparmy 0: -t array 0:
abstr (xs, n) mkarmy xs
The array operations can be implemented by the definitions:
mkarmy' xs = (xs, #xs)
length' (xs, n) = n
lookup' (xs , n) k xs ! k
update' (xs , n) k x (ys, n), if k < n
where ys = take k xs * [x]*
drop (k + 1 ) xs
Under this representation, the cost of either accessing or modifying an array
element is O (n) steps, where n is the length of the array. In particular,
evaluating:
lookup' ( update' (xs , n) k x) k
requires O ( n) steps.
A second representation of arrays is obtained by replacing the list in the
first representation by a function defined on an initial segment of the natural
numbers. Define:
reparmy 0: == (num -t 0: , num)
and the abstraction function:
abstr .. reparmy 0: -t army 0:
abstr (/ , n) = mkarray (map f [O . . n - 1])
The array operations are then implemented by the functions:
mkarmy' xs (/ , #xs)
where f k = xs ! k
length' (J, n) n
lookup' (J , n) k = fk
update' (J , n) k x = (g, n), if O :::; k A k < n
where g j = x , if j = k
= f j , ot herwise
8.4.6 Sets
The primary example of an abstract type is the notion of a set. In fact, all
of mathematics can be explained in terms of the theory of sets. Sets can
be represented by lists, lists with no duplicates, ordered lists, trees, boolean
functions, and so on, but no representation is more fundamental than any
other. The programmer is therefore free to choose between them on the
grounds of what set operations are required and how efficiently they can be
implemented.
Suppose we are interested in the abstract type set a which consists of the
following six operations:
empty .. set a
unit .. a -+ set a
union .. set a -+ set a -+ set a
inter .. set � -+ set a -+ set a
differ .. set a -+ set a -+ set a
member .. a -+ set a -+ bool
1. The constant empty denotes the empty set; this set is also denoted by
the special symbol { } .
2. The function unit takes an element x of type a and returns a singleton
set containing x ; in mathematical notation we write unit x = {x} .
3. The f�nction union takes two sets S and T and returns the set which
contains all the elements of S and T; in mathematical notation we
write union S T = S U T.
4. The function inter computes the intersection of two sets. The result of
( inter S T) is the set consisting of the common members of S and T.
The mathematical notation for this operation is n ; thus inter S T =
S n T.
5. The function differ computes the difference of two sets. The result of
( differ S T) is the set which contains all the elements of S that are not
in T. In mathematical notation we write differ S T = S \ T.
6. The predicate member takes a value x and a set S and returns True
or False depending on whether x is a member of S. In mathematical
notation, ( member x S ) is written as x € S.
insert x S = S U {x}
delete x S = S \ {x}
8.4 ABSTRACT TYPES 229
together with empty and member, are the only ones required. A type based
on these four operations is often calle d a 'dictionary'.
The crucial operation on sets is the test for membership. It satisfies the
following laws:
x e {} = False
x e {y} = (x = y)
x e (A U B) = (x e A) V (x e B)
x e (A n B) = (x e A) J\ (x e B)
x e (A \ B ) = (x e A) J\ ,(x e B)
for all values of x . In addition, two sets are defined to be equal if and only
if they have the same members. From these facts we can derive a number of
laws about U , n and \. In particular,
A U (B U C) = (A U B) U C ( associativity )
AuB = BuA ( commutativity )
AuA = A (idempotence )
AU{} A (identity )
We have already seen how to represent finite sets by arbitrary lists, lists with
no duplicates, and ordered lists. It is interesting to compare the efficiency
of the various set operations under these three representations. For brevity,
we consider only the implementations of U and \. Using unrestricted lists we
can define:
union xs ys = xs * ys
differ xs ys = [x I x +- XS j ,member x ys]
enumerate p = [x I x - [1 . . ] ; p x]
will return a list of the elements of the set represented by the predicate p,
but this list will always be infinite or partial.
Exercises
8 . 4 . 1 Prove that each of insert1 , insert2 and insert3 satisfies the specifica
tion for inserting a value into a set.
8.4.2 Consider the following representation of queues:
and hence derive yet another definition of list reversal which works in linear
time.
8.4.4 Consider the following type for representing lists:
Give the definition of a suitable abstraction function abstr :: seq a --t [a].
Write down the specification of the function tailseq which corresponds to tl
on lists. Give an implementation of tailseq and show that it meets the specifi
cation. In general, what are the advantages and disadvantages of representing
lists in this way?
8.4.5 Consider the problem of describing a simple line editor. Suppose a
line is a sequence of characters C1 C2 Cn together with a position p , where
• • •
newline line
movestart line --t line
moveend line --t line
moveleft line --t line
moveright line --t line
insert char --t line --t line
delete line --t line
Trees
As its name implies, a binary tree is a tree with a simple two-way branching
structure. This structure is captured by the following type declaration:
Bin ( Tip 1)
(Bin ( Tip 2) ( Tip 3»
consists of a binary node with a left subtree ( Tip 1), and a right subtree
(Bin ( Tip 2) ( Tip 3» which, in tum, has a left subtree ( Tip 2) and a right
233
234 TREES
subtree ( Tip 3). Compare this element of ( btree num) with the tree:
Bin (Bin ( Tip 1) ( Tip 2))
( Tip 3)
Although the second tree contains the same sequence of numbers in its tips
as the first, the way the iIiformation is organised is quite different and the
two expressions denote distinct values.
A tree can be pictured in one of two basic ways, growing upwards or
growing downwards. Both orientations are illustrated in Figure 9.1. The
downward pointing orientation is the one normally preferred in computing,
and this is reflected in some of the basic terminology of trees. For instance,
we talk about the 'depth' of a tree rather than its 'height'. The depth of a
tree is defined below.
for all binary trees t. The result can be proved by structural induction and
is left as an exercise. It also holds, by default, in the case of infinite trees,
since both sides of the above equation reduce to ..L
The second useful measure on trees is the notion of depth. The depth of
a finite tree is defined as follows:
depth ( Tip x ) = 0
depth (Bin t1 t2 ) = 1 + ( depth t1 ) max ( depth t2 )
In words, the depth of a tree consisting of a single tip is 0; otherwise it is
one more than the greater of the heights of its two subtrees. For example,
the tree on the left in Figure 9.2 has depth 3, while the one on the right
has depth 4. Notice these trees have the same size and, indeed, exactly the
same sequence of tip values. The notion of depth is important because it is
a measure of the time required to retrieve a tip value from a tree.
Although two trees of the same size need not have the same depth, the
two measures are not independent . The following result is one of the most
important facts about binary trees. It says that :
size t � 2 � depth t
for all (finite ) trees t. We shall prove this inequality by structural induction
on t .
Case ( Tip x ) . We have:
size ( Tip x ) = 1 (size .1)
= 2�0 (�.1)
= 2 � depth ( Tip x ) ( depth.1)
as required.
236 TREES
size t1 :$ 2 � depth t1
size t2 :$ 2 � depth t2
and let:
d = ( depth t1 ) max ( depth t2 )
We now have:
size (Bin t1 t2) = size t1 + size t2 ( size . 2)
:$ 2 � ( depth t1 ) + 2 � ( depth t2 ) ( hypothesis )
:$ 2� d + 2�d ( monotonicity of �)
2 � ( 1 + d) ( arithmetic)
= 2 � ( depth (Bin t1 t2 )) ( depth.2)
as required. 0
By taking logarithms ( to base 2), we can restate the result in the following
equivalent form:
depth t 2: log2 (size t)
for allfinite trees t.
Given any positive integer n, it is always possible to construct a tree of
size n with depth d satisfying:
Here, binary nodes are 'labelled' with values of a second type f3. Apart from
these additional values, a labelled binary tree has exactly the same structure
as the earlier kind. Because extra information is available, many operations
on binary trees can be implemented efficiently in terms of labelled binary
trees. We shall see examples of this idea in subsequent sections.
Exercises
9 . 1 . 1 Prove that the number of tips in a binary tree is always one more than
the number of internal nodes.
9 . 1 .3 Show that:
depth t � size t - 1
for all finite binary trees.
As a first example of the use of binary trees, we shall consider the problem
of coding data efficiently. As many computer users know only too well, it
is often necessary to store files of information as compactly as possible in
order to free precious space for other, more urgent, purposes. Suppose the
information to be stored is a text consisting of a sequence of characters. The
ASCII standard code uses seven bits to represent each of 2 7 = 128 possible
different characters, so a text of n characters contains 7n bits of information.
For example, the letters 't', 'e', and 'x' are represented in ASCII by the codes:
t --t 1110100
e --t 1100101
x --t 1111000
Under this scheme, "text" would be coded as the sequence 01010 of length
5. However, the string "tee" would be coded by exactly the same sequence,
and this is obviously not what is wanted. The simplest way to prevent this
happening is to choose codes so that no code is a proper initial segment ( or
prefix) of any other.
As well as requiring unique decipherability, we also want the coding
scheme to be optimal. An optimal coding scheme is one which minimises
the expected length of the coded text. More precisely, if characters Cj , for
1 :5 j :5 n, have probabilities of occurrence Pj, then we want to choose codes
with lengths lj such that
is as
small as possible.
One method for constructing an optimal code satisfying the prefix prop
erty is called Huffman coding ( after its inventor, David Huffman ) . Each
character is stored as a tip of a binary tree, the structure of which is deter
mined by the computed frequencies. The code for a character c is a sequence
of binary values describing the path in the tree to the tip containing c. Such
a scheme guarantees that no code is a prefix of any other. We can define a
path formally by:
step Left I Right
path [step]
A path is therefore a sequence of steps, each of which is one of the two values
Left or Right . A path can be traced by the function trace , defined by:
trace ( Tip x ) [ ] = x
trace (Bin t1 t2 ) (Left : ps) = trace t1 ps
trace (Bin t1 t2 ) (Right : ps) = trace t2 ps
In this tree the character 'x' is coded by the path [Left, Left] , character "e'
by [Left , Right] , and character "t' by [Right] . So "t' is coded by one bit of
information, while the others require two bits. For example, the string "text"
is encoded by the sequence:
arises when every right subtree has size 1, and x appears as the leftmost tip
value.
One way to improve this unacceptable quadratic behaviour is to define:
codex t x = hd (codesx t x)
codesx ( Tip y) x = [[ ] ) , if x = y
= [], otherwise
codesx (Bin tl t2 ) x = map (Left :) ( codesx tl x)*
map (Right :) ( codesx t2 x)
This is an example of the list of successes technique described in Chapter 6.
The value ( codesx t x) is a list of all paths in t which lead to a tip containing
x. If t is a tip , then the list is either empty (if the characters do not match ) ,
or is a singleton list containing the empty path. If t is not a tip, then the final
list is the concatenation of the list of paths through the left subtree with the
list of paths through the right subtree. If t contains exactly one occurrence
of x , then ( codesx t x) will be a singleton list containing the desired path. We
shall leave as an exercise the proof that this version of codex requires only
O (n) steps.
Constructing a Huffman tree. Now we must deal with the most interest
ing part , building a coding tree. Let us suppose that the relative frequencies
of the characters have been computed from the sample, so we are given a list
of pairs:
[( Ct , Wl ) , ( C2 ' W2), . . . , ( cn , wn )]
where Cl , C2, , C n are the characters and Wl , W2,
• • • • • •, Wn are numbers , called
weights, indicating the frequencies. The probability of character Cj occurring
is therefore Wj / W , where W = I: Wj . Without loss of generality, we shall
also suppose that the weights satisfy WI � W2 � • • � Wn , so the characters
.
Thus, the weight of a tree is the sum of the weights in its tip nodes.
In order to determine at each stage which are the lightest trees, we simply
keep the trees in increasing order of weight . Thus:
where:
insert 'U [ ] = [ 'U1
insert 'U ( t : ts) = 'U : t : ts , if weight 'U :£ weight t
= t : insert 'U ts , otherwise
Although this definition of combine is adequate, it leads to an inefficient
algorithm as tree weights are constantly recomputed. A better solution is to
store the weights in the tree as labels, and this is where the idea of using a
labelled binary tree comes in. Consider the type:
Here, a tip node is indicated by a new constructor Leaf that takes two ar
guments: a number and a character. A binary node is indicated by the new
constructor Node. The numeric label is a value w satisfying:
w = weight (Node w tl t2 )
where:
weight ( Leaf w x ) = w
weight (Node w tl t2 ) = weight tl + weight t2
By maintaining this relationship we can avoid recomputing weights.
To implement the change, we redefine:
imises the sum of the "weighted" path lengths w;l; for 1 ::; j ::; n. For a
proof of this fact, the reader should consult Knuth [3] or Standish [11].
9.2 HUFFMAN CODING TREES 245
Exercises
9.2.1 Explain why a Huffman code has the prefix property.
9.2.2 Construct a code which does not satisfy the prefix property, but which
nevertheless is such that every text can be uniquely decoded.
Consider the functions ( codexs t) and ( decodexs t) for coding and de
9 . 2 .3
coding sequences of characters with Huffman's method. Show that :
9.2.6 Modify the definition of insert in the tree combining part of Huffman's
algorithm to use the weight-labelled representation of Huffman trees.
9.2.7 In the coding phase of Huffman's algorithm, the function codex is
applied to every occurrence of every character in the text . But all we really
need do is to compute ( codex t x ) just once for each character x , store this
information in a table, and use the table for subsequent retrieval. This table
can itself be implemented as a binary tree. Write functions for implementing
this idea, and hence show that the cost of coding characters can be brought
down to O (log n), where n is the number of characters.
In this section we shall study a variety of tree, called a binary search tree,
that is useful for representing values of the abstract type (set a) in which the
only set operations allowed are:
empty .. set a
insert .. a � set a � set a
delete .. a � set a � set a
member .. a � set a � bool
These operations are defined as follows:
empty = {}
insert x s = s U {x}
delete x s = s \ {x}
member x s = X E S
The implementation of empty as the empty tree Nil is immediate, and the
definition of member requires only a minor change to the one given above.
This leaves us with insert and delete. The implementation of insert can be
synthesised from its specification and the result is the following definition:
insert x Nil Bin x Nil Nil
=
= Bin y t1 t2 , if x = y
= Bin y t1 (insert x t2 ), if x > y
We shall synthesise delete , leaving insert as an exercise.
join Nil t2 = t2
join (Bin :z: ul u2 ) t2 = Bin :z: ul (join u2 t2 )
While this definition of join is perfectly correct, it is unsatisfactory because
the depth of the resulting tree can be quite large. In effect , the second tree
t2 is appended as a new subtree at the bottom right of tl . The result is a
tree whose depth is the sum of the depths of tl and t2 . As we have seen, the
efficiency of tree insertion and membership depends critically on the depth
of the tree and we would like to ensure that , when joining trees, we keep the
9.3 BINARY SEARCH TREES 251
This step is valid provided labels t1 i' [ ] . Furthermore, let us introduce the
functions initree and lastlab satisfying the equations:
labels ( initree t ) = init (labels t)
lastlab t = last (labels t)
The case t1 = Nil for the new synthesis of join is the same as before.
The remaining case is as follows.
In fact , there is no simple method for joining two arbitrary trees; the best
approach is to build the new tree from scratch. It follows that binary search
trees are not particularly efficient for the implementation of other set opera
r
tions, such as set union or set difference. If these ae the crucial operations
in a given application, then a representation of sets by ordered lists is the
most appropriate.
Exercises
9.3.2 The function initree, used in the synthesis of binary search tree dele
tion, is specified by the equation:
9.3.5 Synthesise the definition of insert for inserting a new value into a
binary search tree. Also, synthesise a version of delete which uses the rule
The tree insertion and deletion operations studied in the last section suffer
from the disadvantage that they are efficient only on average. Although a
sequence of insertions and deletions of random elements can be expected to
produce a tree whose depth is reasonably small in relation to its size, there is
always the possibility that a very unbalanced tree will emerge. In this section
we will outline a technique which guarantees that both insertion and deletion
can be performed in logarithmic time.
The idea is to impose an extra condition on binary search trees, namely
that they should be balanced. A binary tree t is said to be ( depth- ) balanced
if ( depthbal t) holds, where:
depthbal Nil = 7'rue
depthbal ( Bin x t1 t2 ) = abs ( depth t1 - depth t2 ) :::; 1 A
depthbal t1 A depthbal t2
In words, a balanced tree is a tree with the property that the depth of the
left and right subtrees of each node differ by at most one. A balanced tree
may not be minimal, but its depth is always reasonably small in comparison
to its size. The precise relationship between depth and size in a balanced
tree is analysed below.
Let us consider how to rebalance a tree after an insertion or deletion.
Define the slope of a tree by:
slope Nil = 0
slope (Bin x t1 t2 ) = depth t1 - depth t2
A tree t is therefore balanced if abs (slope s) :::; 1 for every subtree s of t .
Since a single insertion or deletion can alter the depth of any sub tree by at
most one, rebalancing is needed when slope t = 2 or slope t = - 2 . The two
situations are symmetrical, so we shall study only the case slope t = 2 .
Suppose then that t = Bin x t1 t2 , where t1 and t2 are balanced trees,
but slope t = 2. There are two cases to consider:
(i ) If the left subtree t1 of t has slope 1 or 0, then t can be rebalanced by a
simple 'rotation' to the right . This operation is pictured in Figure 9.4,
where the depths of the various subtrees appear as labels. It can be
seen from the picture that the rotation operation restores the balance
of t, without destroying the binary search tree condition.
(ii ) On the other hand, if slope t1 = - 1 , then we first have to rotate t1
to the left before rotating t to the right. This initial rotation left is
pictured in Figure 9.5.
� d+2
d+ 1
/\ d [d+ 1 ]
t3
d [d+ 1 ] d
�
t1
L.
t2
�
t2
6
t3
� d+2 d
6
/\
d+ 1 d [d- 1 )
t4
/ \ I(rd [d- 1 ]
A
t3
D. �
t2 t1 t2
Thus rebal is applied to every tree along the path to the newly inserted node.
In fact, it can be shown that at most one tree actually requires rebalancing as
a result of an insert operation, but this can occur anywhere along the path.
Since rebal requires constant time, the cost of computing insert is O (log n )
steps, where n is the size of the tree.
256 TREES
S(O) = 0
S(1) = 1
S( d + 2) = 1 + S( d) + S( d + 1)
S( d) = fib ( d + 2) - 1
where:
¢ = 1 + V5 and ¢ = 1 - V5
2 2
Since I ¢I < 1, it follows that:
Hence, if S ( d) � n, we have:
Finally, since log", 2 = 1.4404 . . it follows that the depth of a balanced tree
.
is never more than about 45 per cent worse than the theoretical minimum.
9.5 ARRAYS 257
Exercises
9.4.1 Consider the functions rotr and rotl for carrying out rotations on bal
anced trees. Show that:
labels (rotr t) = labels t
labels (rotl t) = labels t
and so:
labels (rebal t) = labels t
9 . 4 . 2 Slope calculations in balancing binary search trees can be avoided by
storing slope values as extra labels in the tree. Give the appropriate type
definitions and show how to modify the rotation functions rotl and rotr to
maintain this information.
9.4.3 Why does at most one tree actually require rebalancing after an insert
operation on balanced trees?
9 . 4 .4 Give the definition of delete for deleting a value from a balanced tree.
9 .4 . 5 Implement sets using ordered lists , binary search trees, and balanced
binary search trees. Asymptotic analysis tells us that for sufficiently large
sets, balanced trees will be the best implementation. Experiment to deter
mine for what size of set each representation is most appropriate.
9.5 Arrays
where tips is the function for listing the tip values in left to right order. Given
that we want mkarray to return a size-balanced tree, the following definition
should be straightforward:
mkarray xs = Tip (hd xs), if n = 1
= Bin (mkarray ys) (mkarray Z8 ) , if n > 1
where n = #xs
ys = take (n div 2) xs
zs = drop (n div 2) xs
Since ( n div 2) and (n - n div 2) differ by at most 1, it should be clear that
mkarray returns a size-balanced tree. The time T(n) required by mkarray to
build a tree of size n is the sum of the time needed to split the list , which is
O( n) steps, and the time to construct two subtrees, which is approximately
2T(n/2) steps. Hence T satisfies the recurrence equation:
on the other hand, k ;::: m, then the kth tip will be the tip at position ( k - m)
in the right subtree. Ignoring the cost of computing size for the moment , it
follows that retrieving the kth tip will require 0 (log n) steps whenever t is a
.
lookup ( Tip x ) 0 = x
lookup (Bin m tl t2 ) k = lookup tl k, if k < m
= lookup t2 ( k - m), otherwise
With the new definition, the time required to compute lookup is o (log n)
steps. Notice that the price paid for this improvement is the extra space
needed to store information about tree sizes.
The definition of update is equally simple:
Observe that if k does not lie in the range 0 :5; k < size t, then update tkx = 1. .
Finally, t o complete the repertoire of array operations, the function length
for determining the number of elements in the array is defined by:
length ( Tip x) = 1
length (Bin m tl t2 ) = m + length t2
Exercises
9.5.1 Show that a size-balanced tree is minimal.
lookup t k = ( tips t) ! k
Write down the recurrence relation for T( n), the time required to evaluate
(mkarray2 n zs), and hence show that the new definition leads to a more
efficient computation.
9.5.6 Write functions similar to mkarray for building a labelled binary tree
in which (i ) the label of a tree is the maximum tip value; (ii ) the label of a
tree is its depth.
9.5.7 Construct suitable definitions of mktree and revtree so that :
9.5.9 What other common list-processing functions are made more efficient
by representing lists as binary trees?
Now let us turn to trees with a multiway branching structure. Consider the
type declaration:
gtree a ::= Node a [gtreea]
An element of (gtree a ) thus consists of a labelled node together with a list
of subtrees. For example, (Node 0 [ ] ) is an element of (gtreenum), and so is:
A tree is finite if it has a well-defined size. Note that a tree can be finitary
without being finite. Unlike the case of binary trees, the size of a general
tree can be arbitrarily larger than its depth.
General trees have numerous applications and arise naturally whenever a
hierarchical classification is discussed. For example, this book is organised as
a tree structure as the method for numbering chapters and sections shows.
The departmental structure of many large organisations is also usually ex
pressed as a tree. We shall consider one application of general trees below,
and another in the next section.
262 TREES
is a tree:
Node J [tb � , tn] " "
and:
curry (Node f [t1 , t2])
= foldl Bin ( Tip f) (map curry [t1 , t2])
= foldl Bin ( Tip f) [curry t1 , curry t2]
= foldl Bin (Bin ( Tip f) (curry t1 )) [curry t2]
= Bin (Bin ( Tip f) ( curry t1 )) ( curry t2)
It is left as an exercise for the reader to show that (curry t) returns a well
defined binary tree if t is a finitary tree; however, the correspondence breaks
down if t is non-finitary.
Let us now construct the inverse correspondence:
Case ( Tip x). To synthesize a value for uncurry ( Tip x), we reason that :
Tip x = foldl Bin ( Tip x ) [ ] (foldl.1)
= foldl Bin ( Tip x ) (map curry [ ]) (map.1)
= curry (Node x [ ] ) (curry.1)
Hence:
uncurry ( Tip x ) = uncurry (curry (Node x [ ] )
= Node x [ ] ( spec. )
264 TREES
Case (Bin b1 b2). To construct a value for uncurry (Bin b1 b2) , we suppose
that:
b1 = curry (Node x ts) ( b1 .1)
b2 = curry t ( b2 .1 )
It follows from the first equation and the definition of curry that:
b1 = foldl Bin ( Tip x ) (map curry ts) ( b1 .2)
We also need the law:
Joldl (EEl) (foldl (EEl) a xs ) ys = Joldl (EEl) a ( xs * ys ) (law )
Now we reason:
Bin b1 b2
= foldl Bin b1 [b2] (foldl.1,Joldl.2)
= Joldl Bin (foldl Bin ( Tip x ) (map curry ts)) [b2] ( b1 .2)
= Joldl Bin ( Tip x) (map curry ts * [b2]) (law )
= Joldl Bin ( Tip x ) (map curry ts * [curry t]) ( b2.1)
= Joldl Bin ( Tip x ) (map curry (ts * [t])) (map, *)
= curry (Node x (ts * [t])) ( curry.1)
Hence we have:
uncurry (Bin b1 b2 ) = uncurry (curry (Node (ts * [t])))
= Node x ( ts * [tD ( spec. )
We have therefore shown that :
uncurry ( Tip x) = Node x [ ]
uncurry (Bin b1 b2) = Node x (ts * [t])
where Node x ts = uncurry b1
t = uncurry b2
It is possible to improve the efficiency of uncurryj details are left as an
exercise.
Con "Pair" [Con "0" [ ], Con "Cons" [ Var "y" , Con "Nil" [ lll
Here, (Pair x y) corresponds to (x, y), and ( Cons x xs) to x : xs. Note that
the constants 0 and Nil are represented as constructors of no arguments. We
assume that each constructor c is associated with a fixed number, arity c, of
arguments , so that ( Con c es) is only well-formed if # es = arity c .
By definition, a pattern is an arbitrary element of exp, while a constant
expression is an element of exp that does not contain variables. For conve
nience, we introduce the type synonyms:
pattern exp
conexp - - exp
iden * x = Var x
for all x in var. In particular, we have:
apply iden e = e
for all expressions e.
The substitution ( unit x e) has the property that:
(unit x e) * y = e, if x = y
= Var y, otherwise
Thus, the function apply (unit x e) takes an expression and replaces every
occurrence of ( Var x) by e , but leaves other variables unchanged.
The function dom ret urns the 'essential' variables of a sub stit ution . It is
defined abstractly by the equation:
dom s = { x I s * :c :f Var :c }
9.6 GENERAL TREES 267
In particular:
dom iden = {}
dom ( unit :c e ) { :c }, (if e ::f Var :c )
The function dom is needed below.
Finally, the operation (EEl) combines two substitutions. This is a partial
operation which is defined only if the substitutions are compatible. Two
substitutions are compatible if they return the same results on all essential
variables. We define:
( 81 EEl 82 ) * :c = 81 * x, if x e dom 81
= 82 * x, if x e dom 82
= Var x, otherwise
It follows that:
81 EEl 82 = 82 EEl 81
for all compatible 81 and 82 .
The algebra of substitutions under EEl is similar to the algebra of sets
under U . In particular:
81 EEl iden = 81 (identity )
81 EEl 82 = 82 EEl 81 ( commutativity )
81 EEl 81 = 81 (idempotency )
81 EEl ( 82 EEl 83 ) = ( 81 EEl 82 ) EEl 83 ( associativi ty )
However, unlike U , the operation EEl is partial; for example, the associative
law only holds if 81 , 82 and 83 are pairwise compatible because only in this
case is either side defined.
Since it is inconvenient to work with partial algebras, let us turn EEl into a
total operation. The way to do this is to 'lift ' the class of substitutions by in
cluding an extra element, called Fail, to denote the inconsistent substitution.
In general, we can define the type:
lift a : := Fail I Ok a
for lifting a type a by including one extra element. We then have:
1. as = Ok s and apply s p = ej or
2. as = Fail and apply s p # e for all s.
The most straightforward implementation of match is as follows:
match :: pattern -+ conexp -+ lsubst
match ( Var x ) e = Ok (unit x e)
match ( Con c ps) ( Con d es)
= combine (zipwith match ps es), if c = d
= Fail, otherwise
combine = foldr ( ®) ( Ok iden)
The function zipwith is defined by:
superior algorithm can be based on the idea of testing s1 and s2 for compat
ibility only when one of them is a unit substitution. Consider the function:
match1 .. lsubst -+ (pattern, exp) -+ lsubst
match1 as (p, e ) = as (8) match p e
Since ( Ok iden) is the identity element of (8) , we have:
match p e = match1 ( Ok iden) (p, e)
and so match can be defined in terms of match1 . Starting with the given
definition of match1 , it is possible to derive the following alternative defini
tion:
match1 Fail (p, e ) = Fail
match1 ( Ok s) ( Var x , e)
= Ok ( extend s x e ) , if s * x = Var x
= Ok s , if s * x = e
= Fail, otherwise
match1 ( Ok s) ( Con c ps, Con d es)
= foldl match1 ( Ok s) (zip ps es), if c = d
= Fail, otherwise
The subsidiary function extend satisfies:
( extend s x e) * y = e, if x = y
= s * y, otherwise
We shallleave the pleasure of synthesising the above definition to the reader.
The revised definition of match, in terms of match1 , has one very useful fea
ture: apart from iden, the only abstract operation on substitutions appearing
in the definition is *! Since we no longer have to compute dom values, we
can implement substitutions directly as functions. Specifically, we take:
subst == var -+ exp
replace * by functional application, and define:
iden x = Var x
Exercises
9.6.1 Give a definition of mapgtree that is the analogue of map for general
trees.
9.6.2 Show that:
uncurry (curry t) = t
for all finitary trees t. Give an example of a non-finitary tree for which the
above equation fails to hold.
270 TREES
9.6.4 Consider the synthesis of uncurry from curry . Suppose we look for a
9.6.7 Show that substitutions s and (unit z e) (where e =F Var x ) are com
patible just in the case that:
(s ffi unit z e) * y = e, if z = y
= Var y, otherwise
9.6.9 Prove that the pattern matching function match meets its specifica
tion.
A.
where a and b are numbers (the 'alpha' and 'beta' values) such that a � b.
Since:
( a max x ) min b = a max (x min b)
274 TREES
The function bmz can be used to compute minimaz provided . so�e con
stant m > 0 is given such that�
- m � static p � m
for all positions p. We then have:
minimaz t == bmil: (� m) m t
Although the specification of bm:z: refers to the value (minimaz t), this value
is not required to determine bmz when a = b. We have:
bmz a a t = a
and so evaluation of (minima:z: t) is not required. This is the basis of alpha
beta pruning. Put simply, the Qoun<ied evaluation of a tree is carried out
sequentially in some order, and th� estimates . a and b a,re improved as eval
uation unfolds. If and when the bounds coincide, the minimax value can be
returned without exploring a.riy nlOre of the tree.
As a final exercise in program synthesis, let us now derive an alternative
definition of bm:z: that implements the
. alpha-beta algorithm. We sta,rt with
the specification:
bmz a b (Node :z: ts) = a max (minimaz (Node :z: ts» min b
where a � b and consider two cases. (For brevity, we write mmz for
minimaz .)
Case ts = [ ] . Here we have:
bmz a b (Node :z: [ ] )
= a max (mm:z: (Node :z: [ ] ) min b (spec.)
= a max :z: min b (minimaz .l)
9.7 GAME TREES 275
= a
Case (t : ts ) . Here we have:
cmz a b (t : ts)
= a max ( -min ( map mmz (t : ts))) min b (spec: cmx )
= a max ( - « mmx t) min ( min ( map mmx ts»))) min b
using laws of map and min. To improve readability, let us introducei the
abbreviations:
x = mmz t
y = min ( map mmx · ts ) ·
so the last expression is:
a max ( - ( x min y» min b
= a max « -x ) max (-y» min b ( negation )
= ( a max ( - x ) max (-y» min b ( assoc:max)
= ( a max ( -x ) min b) max ( - y ) min b
using the distributive laws of max and min. Expanding out the abbrevia
tions x and y, the last expression is just ( cmx a' b ts ) , where:
a' = a max ( -mmx t) min b
276 TREES
cmz a a ts = a
and so, when the bounds a and b are equal, further exploration of ts is
unnecessary. The right-hand side of the definition of cmz is changed to read:
cmz a b (t : ts) = a' , if a' = b
= cmx a' b ts , otherwise
where a' = - bmz (- b) (-a) t
cmz a b [ ] = a
cmz a b (t : ts) = a', if a' = b
= cmz a' b ts, otherwise
where a' = - bmz (- b) (-a) t
Exercises
9.1.2 Design a. function that takes a positive integer n and a game tree gt,
and returns a game tree ngt such that (i ) ngt is a subtree of gtj (ii) ngt
contains the position estimated by (dynamic n) to be the best end position
achievable.
Design a. program to play Noughts and Crosses ( also called Tic-The"
9 . 1 .3
Toe). Make the program interactive.
Appendix A
The table on the following page gives the numerical codes of the characters
in the AS CII character set. The primitive function:
converts a character to its ASCII code number (in the range 0 to 127), and:
does the reverse. In particular, the newline charact�r '1.' in strings has code
10, and the space character 'u' has code 32.
277
278 THE ASCII CHARACTER SET
0 NUL 32 SP 64 @ 96 ,
1 SOH 33 65 A 97 a
2 STX 34 » 66 B 98 b
3 ETX ,3 5 # 67 C 99 c
4 EOT 36 $ 68 D 100 d
5 EN Q 37 % 69 E 101 e
14 SO 46 78 N 110 n
15 SI 47 / 79 0 111 0
16 DLE 48 0 80 P 112 p .
17 DC1 49 1 81 Q 1 13 q
18 DC2 50 2 82 R 114 r
19 DC3 51 3 83 S 115 s
20 DC4 52 4 84 T 116 t
21 NAK 53 5 85 U 117 u
22 SYN 54 6 86 V 1 18 v
23 ETB 55 7 87 W 119 w
24 CAN 56 8 88 X 120 x
25 EM 57 9 89 Y 121 Y
26 SUB 58 90 Z 122 z
27 ESC 59 91 [ 123 {
28 FS 60 < 92 \ 124 I
29 GS }
. -
61 = 93 ] 125
30 RS 62 > 94 126
31 US 63 ? 95 - 127 DEL
App endix B
279
280 SOME STANDARD FUNCTIONS
6. dropwhile. Removes the longest initial segment of a list all of whose ele
ments satisfy a given predicate:
dropwhile .. ( a -+ bool) -+ [a ] -+ [ a]
dropwhile p [ ] =
[]
dropwhile p (x : xs ) = dropwhile p xs , if px
= x : xs, ot he rwise
8. loldl. Fold-left:
loldl .. ( a -+ (3 -+ a ) -+ a -+ [(3] -+ a
foldl f a [ ] = a
loldl l a (x : xs) = strict (foldl f) (f a x ) xs
foldr1 I [x] = x
loldr1 I (x : y : xs) = I x (foldr1 I (y : xs»
1st (x , y ) = x
hd (x : xs) = x
281
id .. a -+ a
id x = x
last .. [a] -+ a
last (x : zs) = x, if zs = []
= last zs, otherwise
19. ( - - ) . List-difference: (xs - - ys) is the list that results when, for each
element y in ys , the first occurrence (if any) of y is removed from xs :
max .. [a] -t a
max = 101d11 (max)
min ., [a] -t a
min = 101d11 (min)
or .. [bool] -t bool
or = loldr (V) False
36. takewhile. Selects the longest initial segment of a list all of whose ele
ments satisfy a given predicate:
takewhile .. (a - bool ) - [a] - [a]
takewhile p [ ] = []
takewhile p ( x : xs) = x : takewhile p xs, if p x
= [ ], otherwise
284 SOME STANDARD FUNCTIONS
37. until. Applied to a predicate, a function and a value, returns the result
of applying the function to the value the smallest number of times in order
to satisfy the predicate:
until . . (a -+ bool ) -+ ( a -+ a) -+ a -+ a
until p j z = z, if p z
= until p j (f z ), otherwise
38. zip. Takes a pair of lists into a list of pairs of corresponding elements:
zip .. ([a] , [,BD -+ [(a, ,B )]
zip ( [ ] , ys) = []
zip « z : zs), [ ]) = []
zip « z : zs), (y : ys)) = (z, y) : zip ( u , ys)
App endix C
P rogramming in Miranda
strep xs = [0] , ys = 0
= ys , otherw i s e
where ys = dropwhile (=0) xs
285
286 PROGRAMMING IN MIRANDA
Basic operators.
These are translated according to the table:
Ascii characters.
In common with many languages, Miranda uses an "escape" convention for
denoting certain characters inside character and string constants: this in
volves prefixing another character by a backslash \. The most common in
stance is the newline character '+', which is translated as ' \n ' . Also, the
backslash, the single-quote, and the double-quote character are translated as
, \\ " ' \ " , and , \ , . , respectively.
7}jpe variables.
Generic type variables for which in this book we have used the greek letters
Q, /3, ,,( , and so on, are written in Miranda * , * * , ***, and so on. For example,
foldl : : ( * -) ** -) * * ) -) ** -) [*] -) **
f oldl f a [] = a
foldl f a ( x : xs ) = foldl f (f x a) xs
In particular, foldl has the same type as foldr. Moreover, foldl is not
made strict in its second argument.
For example, in Miranda we can write:
revers e = foldl ( : ) []
Miranda also defines zip3, zip4, and so on (up to 6) for zipping triples and
other tuples.
(iii) The following functions are not (currently) provided in the Miranda
standard environment:
(iv) The infix operators max and min are not provided in Miranda.
Instead one uses the listwise functions max and min. So, where in this book
we write ( a max b), the Miranda programmer would write max [ a , b] .
B ibliography
288
Index
289
290 INDEX