Functional Programming Using F# PDF
Functional Programming Using F# PDF
org/9781107019027
Functional Programming Using F#
MICHAEL R . HANSEN
Technical University of Denmark, Lyngby
H A N S R I SC H E L
Technical University of Denmark, Lyngby
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, Sao Paulo, Delhi, Mexico City
Cambridge University Press
32 Avenue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/9781107684065
C Michael R. Hansen and Hans Rischel 2013
A catalog record for this publication is available from the British Library.
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for
external or third-party Internet websites referred to in this publication and does not guarantee
that any content on such websites is, or will remain, accurate or appropriate.
Contents
Preface page ix
1 Getting started 1
1.1 Values, types, identifiers and declarations 1
1.2 Simple function declarations 2
1.3 Anonymous functions. Function expressions 4
1.4 Recursion 6
1.5 Pairs 11
1.6 Types and type checking 13
1.7 Bindings and environments 14
1.8 Euclids algorithm 15
1.9 Evaluations with environments 17
1.10 Free-standing programs 19
Summary 19
Exercises 20
4 Lists 67
4.1 The concept of a list 67
4.2 Construction and decomposition of lists 71
4.3 Typical recursions over lists 74
4.4 Polymorphism 78
4.5 The value restrictions on polymorphic expressions 81
4.6 Examples. A model-based approach 82
Summary 88
Exercises 89
7 Modules 149
7.1 Abstractions 149
7.2 Signature and implementation 150
7.3 Type augmentation. Operators in modules 153
7.4 Type extension 155
7.5 Classes and objects 156
7.6 Parameterized modules. Type variables in signatures 157
7.7 Customizing equality, hashing and the string function 159
7.8 Customizing ordering and indexing 161
7.9 Example: Piecewise linear plane curves 162
Contents vii
Summary 170
Exercises 170
9 Efficiency 197
9.1 Resource measures 197
9.2 Memory management 198
9.3 Two problems 204
9.4 Solutions using accumulating parameters 206
9.5 Iterative function declarations 209
9.6 Tail recursion obtained using continuations 212
Summary 216
Exercises 216
11 Sequences 251
11.1 The sequence concept in F# 251
11.2 Some operations on sequences 254
viii Contents
References 353
Index 355
Preface
The purpose of this book is to introduce a wide range of readers from the professional
programmer to the computer science student to the rich world of functional programming
using the F# programming language. The book is intended as the textbook in a course on
functional programming and aims at showing the role of functional programming in a wide
spectrum of applications ranging from computer science examples over database examples
to systems that engage in a dialogue with a user.
The background
The material in this book has been developed in connection with courses taught at the Tech-
nical University of Denmark, originating from the textbook Introduction to Programming
Using SML by Hansen and Rischel (Addison-Wesley, 1999).
It has been an exciting experience for us to learn the many elegant and useful features of
the F# language, and this excitement is hopefully transferred to the reader of this book.
The chapters
Chapter 1: The basic concepts of F#, including values, types and recursive functions, are
introduced in a manner that allows readers to solve interesting problems from the start.
Chapter 2: A thorough introduction to the basic types in F# is given, together with a gentle
introduction to the notion of higher-order functions.
Chapter 3: The simplest composite types of F#, tuples and records, are introduced. They
allow several values to be grouped together into one component. Furthermore, tagged
values are introduced.
ix
x Preface
Chapter 4: A list is a finite sequence of values with the same type. Standard recursions on
lists are studied and examples illustrating a model-based approach to functional program-
ming are given.
Chapter 5: The concepts of sets and maps are introduced and the powerful F# collection
libraries for lists, sets and maps are studied and applied in connection with a model-based
approach.
Chapter 6: The concept of finite tree is introduced and illustrated through a broad selection
of examples.
Chapter 7: It is shown how users can make their own libraries by means of modules
consisting of signature and implementation files. Furthermore, object-oriented features of
F# are mentioned.
Chapter 8: Imperative features of F# are introduced, including the array part of the col-
lection library and the imperative sets and maps from the .NET framework.
Chapter 9: The memory management concepts, stack, heap and garbage collection, are
described. Tail-recursive functions are introduced and two techniques for deriving such
functions are presented: one using accumulating parameters, the other continuations.
Their efficiency advantages are illustrated.
Chapter 10: A variety of facilities for processing text are introduced, including regular
expressions, file operations, web-based operations and culture-dependent string ordering.
The facilities are illustrated using a real-world example.
Chapter 11: A sequence is a, possibly infinite, collection of elements that are computed
on-demand only. Sequence functions are expressed using library functions or sequence
expressions that provide a step-by-step method for generating elements. Database tables
are viewed as sequences (using a type provider) and operations on databases are expressed
using query expressions.
Chapter 12: The notion of computation expression, which is based on the theory of
monads, is studied and used to hide low-level details of a computation from its defini-
tion. Monadic parsing is used as a major example to illustrate the techniques.
Chapter 13: This last chapter describes how to construct asynchronous reactive programs,
spending most of their time awaiting a request or a response from an external agent, and
parallel programs, exploiting the multi-core processor of the computer.
The first six chapters cover a standard curriculum in functional programming, while the
other chapters cover more advanced topics.
Further material
The book contains a large number of exercises, and further material is available at the books
homepage. A link to this homepage is found at:
http://www.cambridge.org/9781107019027
This material includes a complete set of slides for a course in functional programming plus
a collection of problems and descriptions of topics to be used in student projects.
Preface xi
Acknowledgments
Special thanks go to Peter Sestoft, Don Syme and Anh-Dung Phan. The idea to make a
textbook on functional programming on the basis of F# originates from Peter, who patiently
commented on the manuscript during its production and helped with advice and suggestions.
From the very start of this project we had the support of Don. This is strongly appreciated
and so is the help, clarifications and constructive comments that we received throughout this
project. Phan helped with many comments, suggestions and insights about the platform. We
are grateful for this help, for many discussions and for careful comments on all the chapters.
Furthermore, we are grateful to Nils Andersen, Mary E. Boker, Diego Colombo and Niels
Hallenberg for reading and commenting on the complete manuscript.
Earlier versions of this manuscript have been used in connection with courses at the Tech-
nical University of Denmark and the IT-University of Copenhagen. The comments we re-
ceived from the students in these courses are greatly appreciated.
Getting started
In this chapter we will introduce some of the main concepts of functional programming
languages. In particular we will introduce the concepts of value, expression, declaration,
recursive function and type. Furthermore, to explain the meaning of programs we will intro-
duce the notions: binding, environment and evaluation of expressions.
The purpose of the chapter is to acquaint the reader with these concepts, in order to
address interesting problems from the very beginning. The reader will obtain a thorough
knowledge of these concepts and skills in applying them as we elaborate on them through-
out this book.
There is support of both compilation of F# programs to executable code and the execution
of programs in an interactive mode. The programs in this book are usually illustrated by the
use of the interactive mode.
The interface of the interactive F# compiler is very advanced as, for example, structured
values like tuples, lists, trees and functions can be communicated directly between the user
and the system without any conversions. Thus, it is very easy to experiment with programs
and program designs and this allows us to focus on the main structures of programs and
program designs, that is, the core of programming, as input and output of structured values
can be handled by the F# system.
The answer from the system contains the value and the type of the expression:
val it : int = 10
The system will add some leading characters in the input line to make a distinction between
input from the user and output from the system. The dialogue may look as follows:
> 2*3 + 4;;
val it : int = 10
>
1
2 Getting started
The leading string > is output whenever this particular system is awaiting input from
the user. It is called the prompt, as it prompts for input from the user. The input from the
user is ended by a double semicolon ;; while the next line contains the answer from the
system.
In the following we will distinguish between user input and answer from the system by
the use of different type fonts:
2*3 + 4;;
val it : int = 10
The input from the user is written in typewriter font while the answer from the system
is written in italic typewriter font.
The above answer starts with the reserved word val, which indicates that a value has
been computed, while the special identifier it is a name for the computed value, that is, 10.
The type of the result is int, denoting the subset of the integers {. . . , 2, 1, 0, 1, 2, . . .}
that can be represented using the system.
The user can give a name to a value by entering a declaration, for instance:
where the reserved word let starts the declarations. In this case the system answers:
The identifier price is now a name for the integer value 125. We also say that the identifier
price is bound to 125.
Identifiers which are bound to values can be used in expressions:
price * 20;;
val it : int = 2500
The identifier it is now bound to the integer value 2500, and this identifier can also be
used in expressions:
it / price = 20;;
val it : bool = true
r-
circleArea (2.0);;
val it : float = 12.56637061
Brackets around the argument 1.0 or (2.0) are optional, as indicated here.
The identifier System.Math.PI is a composite identifier. The identifier System de-
notes a namespace where the identifier Math is defined, and System.Math denotes a
namespace where the identifier PI is defined. Furthermore, System and System.Math
denote parts of the .NET Library. We encourage the reader to use program libraries whenever
appropriate. In Chapter 7 we describe how to make your own program libraries.
Comments
A string enclosed within a matching pair (* and *) is a comment which is ignored by the
F# system. Comments can be used to make programs more readable for a human reader by
explaining the intention of the program, for example:
(* Area of circle with radius r *)
let circleArea r = System.Math.PI * r * r;;
val circleArea : float -> float
4 Getting started
A comment line can also begin with three slash characters ///. The tool XMLDocs can
produce program documentation from such comment, but we will not pursue this any further
in this book.
Comments can be very useful, especially in large programs, but long comments should
be avoided as they tend to make it more difficult for the reader to get an overview of the
program.
but it is more natural in this case to use a let-declaration let circleArea r = . . . with
an argument pattern. We shall later see many uses of anonymous functions.
function
| 1 -> 31 // January
| 2 -> 28 // February
| 3 -> 31 // March
| 4 -> 30 // April
| 5 -> 31 // May
| 6 -> 30 // June
| 7 -> 31 // July
| 8 -> 31 // August
| 9 -> 30 // September
| 10 -> 31 // October
| 11 -> 30 // November
| 12 -> 31;;// December
function
stdin(17,1): warning FS0025: Incomplete pattern matches on
this expression. For example, the value 0 may indicate a
case not covered by the pattern(s).
val it : int -> int = <fun:clo@17-2>
The last part of the answer shows that the computed value, named by it, is a function with
the type int -> int, that is, a function from integers to integers. The answer also shows
the internal name for that function. The first part of the answer is a warning that the set
of patterns used in the function-expression is incomplete. The expression enumerates a
value for every legal number for a month (1, 2, . . . , 12). At this moment we do not care
about other numbers.
The function can be applied to 2 to find the number of days in February:
it 2;;
val it : int = 28
This function can be expressed more compactly using a wildcard pattern :
function
| 2 -> 28 // February
| 4 -> 30 // April
| 6 -> 30 // June
| 9 -> 30 // September
| 11 -> 30 // November
| _ -> 31;;// All other months
In this case, the function is defined using six clauses. The first clause 2 -> 28 consists
of a pattern 2 and a corresponding expression 28. The next four clauses have a similar
explanation, and the last clause contains a wildcard pattern. Applying the function to a value
v , the system finds the clause containing the first pattern that matches v , and returns the
value of the corresponding expression. In this example there are just two kinds of matches
we should know:
A constant, like 2, matches itself only, and
the wildcard pattern matches any value.
6 Getting started
For example, applying the function to 4 gives 30, and applying it to 7 gives 31.
An even more succinct definition can be given using an or-pattern:
function
| 2 -> 28 // February
| 4|6|9|11 -> 30 // April, June, September, November
| _ -> 31 // All other months
;;
The or-pattern 4|6|9|11 matches any of the values 4, 6, 9, 11, and no other values.
We shall make extensive use of such a case splitting in the definition of functions, also
when declaring named functions:
daysOfMonth 3;;
val it : int = 31
daysOfMonth 9;;
val it : int = 30
1.4 Recursion
This section introduces the concept of recursion formula and recursive declaration of func-
tions by an example: the factorial function n!. It is defined by:
0! = 1
n! = 1 2 . . . n for n > 0
where n is a non-negative integer. The dots indicate that all integers from 1 to n should
be multiplied. For example:
4! = 1 2 3 4 = 24
Recursion formula
The underbraced part of the below expression for n! is the expression for (n 1)!:
n! = 1 2 . . . (n 1) n for n > 1
(n1)!
Computations
This definition has a form that can be used in the computation of values of the function. For
example:
4!
= 4 (4 1)!
= 4 3!
= 4 (3 (3 1)!)
= 4 (3 2!)
= 4 (3 (2 (2 1)!))
= 4 (3 (2 1!))
= 4 (3 (2 (1 (1 1)!)))
= 4 (3 (2 (1 0!)))
= 4 (3 (2 (1 1)))
= 24
The clauses of the definition of the factorial function are applied in a purely mechanical
way in the above computation of 4!. We will now take a closer look at this mechanical
process as the system will compute function values in a similar manner:
Substitution in clauses
The first step is obtained from Clause 2, by substituting 4 for n. The condition for using the
second clause is satisfied as 4 > 0. This step can be written in more detail as:
4!
= 4 (4 1)! (Clause 2, n = 4)
Computation of arguments
The new argument (4 1) of the factorial function in the expression (4 1)! is computed
in the next step:
4 (4 1)!
= 4 3! (Compute argument of !)
8 Getting started
Thus, the principles used in the first two steps of the computation of 4! are:
Substitute a value for n in Clause 2.
Compute argument.
These are the only principles used in the above computation until we arrive at the expression:
4 (3 (2 (1 0!)))
The next computation step is obtained by using Clause 1 to obtain a value of 0!:
4 (3 (2 (1 0!)))
= 4 (3 (2 (1 1))) (Clause 1)
and the multiplications are then performed in the last step:
4 (3 (2 (1 1)))
= 24
This recursion formula for the factorial function is an example of a general pattern that
will appear over and over again throughout the book. It contains a clause for a base case
0!, and it contains a clause where a more general case n! is reduced to an expression
n (n 1)! involving a smaller instance (n 1)! of the function being characterized.
For such recursion formulas, the computation process will terminate, that is, the computation
of n! will terminate for all n 0.
Recursive declaration
We name the factorial function fact, and this function is then declared as follows:
let rec fact = function
| 0 -> 1
| n -> n * fact(n-1);;
val fact : int -> int
This declaration corresponds to the recursion formula for n!. The reserved word rec occur-
ring in the let-declaration allows the identifier being declared (fact in this case) to occur
in the defining expression.
This declaration consists of two clauses
0 -> 1 and n -> n * fact(n-1)
each initiated by a vertical bar. The pattern of the first clause is the constant 0, while the
pattern of the second clause is the identifier n.
The patterns are matched with integer arguments during the evaluation of function values
as we shall see below. The only value matching the pattern 0 is 0. On the other hand, every
value matches the pattern n, as an identifier can name any value.
Evaluation
The system uses the declaration of fact to evaluate function values in a way that resembles
the above computation of 4!.
1.4 Recursion 9
Substitution in clauses
To evaluate fact 4, the system searches for a clause in the declaration of fact, where 4
matches the pattern of the clause.
The system starts with the first clause of the declaration: 0 -> 1. This clause is skipped
as the value 4 does not match the pattern 0 of this clause.
Then, the second clause: n -> n * fact(n-1) is investigated. The value 4 matches the
pattern of this clause, that is, the identifier n. The value 4 is bound to n and then substituted
for n in the right-hand side of this clause thereby obtaining the expression: 4 * fact(4-1).
We say that the expression fact 4 evaluates to 4 * fact(4-1) and this evaluation is
written as:
fact 4
; 4 * fact(4-1)
where we use the symbol ; for a step in the evaluation of an expression. Note that the
symbol ; is not part of any program, but a symbol used in explaining the evaluation of
expressions.
Evaluation of arguments
The next step in the evaluation is to evaluate the argument 4-1 of fact:
4 * fact(4-1)
; 4 * fact 3
Unsuccessful evaluations
The evaluation of fact n may not evaluate to a value, because
the system will run out of memory due to long expressions,
the evaluation may involve bigger integers than the system can handle, or
the evaluation of an expression may not terminate.2
For example, applying fact to a negative integer leads to an infinite evaluation:
fact -1
; -1 * fact(-1 - 1)
; -1 * fact -2
; -1 * (-2 * fact(-2 - 1))
; -1 * (-2 * fact -3)
; ...
Thus, in finding a declaration of a function, one has to look for a suitable recursion formula
expressing the computation of function values. This declaration of f contains a base case
f 0. However, the second clause does not reduce the general case f(n) to an instance
which is closer to the base case, and the evaluation of f(n) will not terminate when n > 0.
1.5 Pairs
Consider the function:
xn = x x . . . x n occurrences of x, where n 0
where x is a real number and n is a natural number.
The under-braced part of the expression below for xn is the expression for xn1 :
xn = x x
.
. . x n occurrences of x, where n > 0
xn1
0
Using the convention: x = 1, the function can be characterized by the recursion formula:
x0 = 1
xn = x xn1 for n > 0
In mathematics xn is a function of two variables x and n, but it is treated differently in
F# using the concept of a pair:
If a1 and a2 are values of types 1 and 2 then (a1 ,a2 ) is a value of type 1 *2
For example:
let a = (2.0,3);;
val a = (2.0, 3) : float * int
Furthermore, given patterns pat 1 and pat 2 there is a composite pattern (pat 1 ,pat 2 ). It
matches a pair (a1 ,a2 ) exactly when pat 1 matches a1 and pat 2 matches a2 , for example:
let (x,y) = a;;
val y : int = 3
val x : float = 2.0
The concept of a pair is a special case of tuples that are treated in Section 3.1.
Using these concepts we represent xn as a function power with a pair (x,n) as the ar-
gument. The following declaration is based on the above recursion formula, using composite
patterns (x,0) and (x,n):
let rec power = function
| (x,0) -> 1.0 // (1)
| (x,n) -> x * power(x,n-1);; // (2)
val power : float * int -> float
The type of power is float * int -> float. The argument of power is therefore a pair
of type float * int while the value of the function is of type float.
12 Getting started
power a;;
val it : float = 8.0
power(4.0,2);;
val it : float = 16.0
A function in F# has one argument and one value. In this case the argument is a pair (u, i)
of type float * int, while the value of the function is of type float.
The system evaluates the expression power(4.0,2) as follows:
power(4.0,2)
; 4.0 * power(4.0,2-1) (Clause 2, x is 4.0, n is 2)
; 4.0 * power(4.0,1)
; 4.0 * (4.0 * power(4.0,1-1)) (Clause 2, x is 4.0, n is 1)
; 4.0 * (4.0 * power(4.0,0))
; 4.0 * (4.0 * 1.0) (Clause 1, x is 4.0)
; 16.0
The first pattern (x,n) will match any pair of form (u, i) and the second clause will conse-
quently never come into use. The F# compiler actually discovers this and issues a warning:
The function can be applied to an argument (despite the warning), but that would give an
infinite evaluation since the base case (x, 0) -> 1.0 is never reached.
A similar remark on the order of clauses applies to the declaration of fact.
1.6 Types and type checking 13
One should also note that a prior binding of an identifier used in a pattern has no effect on
the pattern matching.3 Hence, the following will also not work:
let zero = 0;;
The first pattern (x,zero) will match any pair of form (u, i), binding x to u and zero
to i so the second clause will again never come into use. The F# compiler issues a warning
like in the previous example.
The above type consideration for function application f (e) is a special case of the general
type rule for function application:
if f has type 1 -> 2 and e has type 1
then f (e) has type 2 .
3 Identifiers that are constructors are, however, treated in a special way (cf. Section 3.8).
14 Getting started
Using the notation e : to assert that the expression e has type , this rule can be
presented more succinctly as follows:
if f : 1 -> 2 and e : 1
then f (e) : 2 .
Consider, for example, the function power with type float * int -> float. In this
case, 1 is float * int and 2 is float. Furthermore, the pair (4.0,2) has type
float * int (which is 1 ). According to the above rule, the expression power(4.0,2)
hence has type float (which is 2 ).
The value of an expression is always evaluated in the actual environment, that contains
the bindings of identifiers that are valid at evaluation time. When the F# system is activated,
the actual environment is the Basis Environment that gives meanings to /, +, -, sqrt,
for example. When using environments we will usually not show bindings from the Basis
Environment. We will usually also omit bindings of identifiers like System.Math.PI
from the Library.
n=qm+r
is then called a division with quotient q and remainder r. There are infinite many possible
remainders (corresponding to different quotients q ):
It follows that there are two possibilities concerning remainders r with |m| < r < |m|:
1. The integer 0 is a remainder and any other remainder r satisfies |r| |m|.
2. There are two remainders rneg and rpos such that |m| < rneg < 0 < rpos < |m|.
16 Getting started
The F# operators / and % (quotient and remainder) are defined (for m = 0) such that:
n = (n / m) m + (n % m) (1.1)
|n % m| < |m| (1.2)
n % m 0 when n 0 (1.3)
n % m 0 when n < 0 (1.4)
so n % m = 0 when m is a divisor of n, otherwise rpos is used if n > 0 and rneg if n < 0.
Note that the corresponding operators in other programming languages may use different
conventions for negative integers.
Euclids algorithm in F#
Euclids algorithm is now expressed in the following declaration
let rec gcd = function
| (0,n) -> n
| (m,n) -> gcd(n % m,m);;
val gcd : int * int -> int
For example:
gcd(12,27);;
val it : int = 3
gcd(36, 116);;
val it : int = 4
contains two clauses: One with pattern (0,n) and expression n and another with pattern
(m,n) and expression gcd(n % m,m). There are hence two cases in the evaluation of an
expression gcd(x, y) corresponding to the two clauses:
1. gcd(0, y): The argument (0, y) matches the pattern (0,n) in the first clause giving the
binding n y , and the system will evaluate the corresponding right-hand side expression
n using this binding:
gcd(0, y) ; (n, [n y]) ; y
2. gcd(x, y) with x = 0: The argument (x, y) does not match the pattern (0,n) in the first
clause but it matches the pattern (m,n) in the second clause giving the bindings m
x, n y , and the system will evaluate the corresponding right-hand side expression
gcd(n % m,m) using these bindings:
gcd(x, y) ; (gcd(n % m, m), [m x, n y]) ; . . .
Consider, for example, the expression gcd(36,116). The value (36,116) does not
match the pattern (0,n), so the first evaluation step is based on the second clause:
gcd(36,116)
; (gcd(n % m, m), [m 36, n 116])
The expression gcd(n % m, m) will then be further evaluated using the bindings for m
and n. The next evaluation steps evaluate the argument expression (n % m, m) using the
bindings:
(gcd(n % m, m), [m 36, n 116])
; gcd(116 % 36, 36)
; gcd(8,36),
The evaluation continues evaluating the expression gcd(8,36) and this proceeds in the
same way, but with different values bound to m and n:
gcd(8,36)
; (gcd(n % m, m), [m 8, n 36])
; gcd(36 % 8, 8)
; gcd(4,8)
The evaluation will in the same way reduce the expression gcd(4,8) to gcd(0,4), but
the evaluation of gcd(0,4) will use the first clause in the declaration of gcd, and the
evaluation terminates with result 4:
gcd(4,8)
;
; gcd(0,4)
; (n, [n 4])
; 4
Note that different bindings for m and n occur in this evaluation and that all these bindings
have disappeared when the result of the evaluation (that is, 4) is reached.
Summary 19
The type string[] is an array type (cf. Section 8.10) and the argument param consists
of k strings (cf. Section 2.3):
param .[0], param .[1], . . . , param .[k 1]
The following is a simple, free-standing hello world program:
open System;;
[<EntryPoint>]
let main(param: string[]) =
printf "Hello %s\n" param.[0]
0;;
It uses the printf function (cf. Section 10.7) to make some output. The zero result signals
normal termination of the program. The program source file Hello.fsx compiles to an
exe-file using the F# batch compiler:
fsc Hello.fsx -o Hello.exe
Using the fsc command requires that the directory path of the F# compiler (with file name
fsc.exe or Fsc.exe) is included in the PATH environment variable.
Summary
The main purpose of this chapter is to familiarize the reader with some of the main concepts
of F# to an extent where she/he can start experimenting with the system. To this end, we have
introduced the F# notions of values, expressions, types and declarations, including recursive
function declarations.
The main concepts needed to explain the meaning of these notions are: integers and
floating-point numbers, bindings and environments, and step by step evaluation of expres-
sions.
20 Getting started
Exercises
1.1 Declare a function g: int -> int, where g(n) = n + 4.
1.2 Declare a function h: float * float -> float, where h(x, y) = x2 + y 2 . Hint: Use
the function System.Math.Sqrt.
1.3 Write function expressions corresponding to the functions g and h in the exercises 1.1 and 1.2.
1.4 Declare a recursive function f: int -> int, where
f(n) = 1 + 2 + + (n 1) + n
for n 0. (Hint: use two clauses with 0 and n as patterns.)
State the recursion formula corresponding to the declaration.
Give an evaluation for f(4).
1.5 The sequence F0 , F1 , F2 , . . . of Fibonacci numbers is defined by:
F0 = 0
F1 = 1
Fn = Fn1 + Fn2
Thus, the first members of the sequence are 0, 1, 1, 2, 3, 5, 8, 13, . . ..
Declare an F# function to compute Fn . Use a declaration with three clauses, where the patterns
correspond to the three cases of the above definition.
Give an evaluations for F4 .
1.6 Declare a recursive function sum: int * int -> int, where
sum(m, n) = m + (m + 1) + (m + 2) + + (m + (n 1)) + (m + n)
for m 0 and n 0. (Hint: use two clauses with (m,0) and (m,n) as patterns.)
Give the recursion formula corresponding to the declaration.
1.7 Determine a type for each of the expressions:
(System.Math.PI, fact -1)
fact(fact 4)
power(System.Math.PI, fact 2)
(power, fact)
1.8 Consider the declarations:
let a = 5;;
let f a = a + 1;;
let g b = (f b) + a;;
Find the environment obtained from these declarations and write the evaluations of the expres-
sions f 3 and g 3.
2
The purpose of this chapter is to illustrate the use of values of basic types: numbers, charac-
ters, truth values and strings by means of some examples. The concepts of operator overload-
ing and type inference are explained. Furthermore, the chapter contains a gentle introduction
to higher-order functions. It is explained how to declare operators, and the concepts of equal-
ity and ordering in F# are introduced. After reading the chapter the reader should be able to
construct simple programs using numbers, characters, strings and truth values.
0;;
val it : int = 0
0.0;;
val it : float = 0.0
0123;;
val it : int = 123
-7.235;;
val it : float = -7.235
-388890;;
val it : int = -388890
21
22 Values, operators, expressions and functions
1.23e-17;;
val it : float = 1.23e-17
Operators
We will use the term operator as a synonym for function and the components of the argument
of an operator will be called operands. Furthermore, a monadic operator is an operator with
one operand, while a dyadic operator has two operands. Most monadic operators are used in
prefix notation where the operator is written in front of the operand.
Examples of operators on numbers are monadic minus -, and the dyadic operators ad-
dition +, subtraction -, multiplication * and division /. Furthermore, the relations: =, <>
(denoting inequality =), >, >= (denoting ), < and <= (denoting ), between numbers are
considered to be operators on numbers computing a truth value.
The symbol - is used for three purposes in F# as in mathematics. In number constants
like -2 it denotes the sign of the constant, in expressions like - 2 and -(2+1)
it denotes an application of the monadic minus operator, and in the expression 1-2 it
denotes the dyadic subtraction operator.
Consider, as a strange example:
2 - - -1;;
val it : int = 1
Starting from the right, -1 denotes the the integer minus one , the expression - -1 denotes
monadic minus applied to minus one, and the full expression denotes the dyadic operation
two minus one.
Division is not defined on integers, but we have instead the operators / for quotient and %
for remainder as described on Page 15, for example:
13 / -5;;
val it : int = -2
13 % -5;;
val it : int = 3
Truth values
There are two values true and false of the type bool:
true;;
val it : bool = true
false;;
val it : bool = false
2.2 Operator precedence and association 23
Logical operators
not (unary) negation
&& logical and (conjunction)
|| logical or (disjunction)
Table 2.1 Operators on truth values
Functions can have truth values as results. Consider, for example, a function even de-
termining whether an integer n is even (i.e., n % 2 = 0). This function can be declared as
follows:
let even n = n % 2 = 0;;
val even : int -> bool
Thus, 1 = 2 && fact -1 = 0 evaluates to false without attempting to evaluate the ex-
pression fact -1 = 0, which would result in a non-terminating evaluation.
+ unary plus
- unary minus
+ addition
- subtraction
* multiplication
/ division
% modulo (remainder)
** exponentiation
Table 2.2 Arithmetic operators
Usual rules for omitting brackets in mathematical expressions also apply to F# expres-
sions. These rules are governed by two concepts: operator precedence and operator associ-
ation for dyadic operators as shown in Table 2.3. The operators occurring in the same row
have same precedence, which is higher than that of operators occurring in succeeding rows.
For example, * and / have the same precedence. This precedence is higher than that of +.
Operator Association
** Associates to the right
* / % Associates to the left
+ - Associates to the left
= <> > >= < <= No association
&& Associates to the left
|| Associates to the left
a;;
val it : char = a
;;
val it : char =
The new line, apostrophe, quote and backslash characters are written by means of the escape
sequences shown in Table 2.4. Functions on characters are found in the System.Char
library.
Sequence Meaning
\ Apostrophe
\" Quote
\\ Backslash
\b Backspace
\n Newline
\r Carriage return
\t Horizontal tab
Table 2.4 Character escape sequences
The operators ||, && and not are convenient when declaring functions with results of
type bool, like in the following declarations of the functions isLowerCaseConsonant
and isLowerCaseVowel determining whether a character is a lower-case consonant or
vowel, respectively:
let isLowerCaseVowel ch =
ch=a || ch=e || ch=i || ch=o || ch=u;;
val isLowerCaseVowel : char -> bool
let isLowerCaseConsonant ch =
System.Char.IsLower ch && not (isLowerCaseVowel ch);;
val isLowerCaseConsonant : char -> bool
where we use the function IsLower from the library System.Char to check whether ch
is a lower-case letter. This library contains predicates IsDigit, IsSeparator, and so
on, expressing properties of a character.
26 Values, operators, expressions and functions
Strings
A string is a sequence of characters. Strings are values of the type string. A string is
written inside enclosing quotes that are not part of the string. Quote, backslash or control
characters in a string are written by using the escape sequences. Comments cannot occur
inside strings as comment brackets ((* or *)) inside a string simply are interpreted as parts
of the string. Examples of values of type string are:
"abcd---";;
val it : string = "abcd---"
"\"1234\"";;
val it : string = "\"1234\""
"";;
val it : string = ""
The first one denotes the 7-character string abcd---, the second uses escape sequences
to get the 6-character string "1234" including the quotes, while the last denotes the empty
string containing no characters.
Strings can also be written using the verbatim string notation where the character @ is
placed in front of the first quote:
@"c0 c1 . . . cn1 "
It denotes the string of characters c0 c1 . . . cn1 without any conversion of escape se-
quences. Hence @"\\\\" denotes a string of four backslash characters:
@"\\\\";;
val it : string = "\\\\"
while the escape sequence \\ for backslash is converted in the string "\\\\":
"\\\\";;
val it : string = "\\"
Verbatim strings are useful when making strings containing backslash characters. Note that
it is not possible to make a verbatim string containing a quote character because \" is inter-
preted as a backslash character followed by the terminating quote character.
Functions on strings
The String library contains a variety of functions on strings. In this section we will just
illustrate the use of a few of them by some examples.
The length function computes the number of characters in a string:
String.length "1234";;
val it : int = 4
String.length "\"1234\"";;
val it : int = 6
2.3 Characters and strings 27
The concatenation function + joins two strings together forming a new string by placing
the two strings one after another. The operator + is used in infix mode:
text + text;;
val it: string = "abcd---abcd---"
The last two examples show that the empty string is the neutral element for concatenation
of strings just like the number 0 is the neutral element for addition of integers.
Note that the same operator symbol + is used for integer addition and string concatenation.
This overloading of operator symbols is treated in Section 2.5.
A string s with length n is given by a sequence of n characters s = c0 c1 cn1 , where the
convention in F# is that the numbering starts at 0. For any such string s there is a function,
written s.[i], to extract the ith character in s for 0 i n 1. The integer i used in s.[i]
is called an index. For example:
"abc".[0];;
val it : char = a
"abc".[2];;
val it : char = c
"abc".[3];;
System.IndexOutOfRangeException: ...
Stopped due to error
where the last example shows (a part of) the error message which will occur when the index
is out of bounds.
If we want to concatenate a string and a character, we need to use the string function
to convert the character to a string, for example
as the operator + in this case denotes string concatenation, and this operator cannot concate-
nate a string with a character.
Conversion of integer, real or Boolean values to their string representations are done by
using the function string, for example:
string -4;;
val it : string = "-4"
string 7.89;;
val it : string = "7.89"
string true;;
val it : string = "True"
A simple application of this conversion function is the declaration of the function nameAge:
let nameAge(name,age) =
name + " is " + (string age) + " years old";;
It converts the integer value of the age to the corresponding string of digits and builds a
string containing the string for the name and the age. For example:
nameAge("Diana",15+4);;
val it : string = "Diana is 19 years old"
nameAge("Philip",1-4);;
val it : string = "Philip is -3 years old"
The string function can actually give a string representation of every value, including
values belonging to user-defined types. We shall return to this in Section 7.7. Examples of
string representations are:
string (12, a);;
val it : string = "(12, a)"
string nameAge;;
val it : string = "FSI_0022+it@29-4"
where the pair (12, a) has a natural string representation in contrast to that of the user-
defined nameAge function.
and exp 3 will be evaluated (none of them will be evaluated if the evaluation of exp 1 does
not terminate).
An if-then-else expression is used whenever one has to express a splitting into cases
that cannot be expressed conveniently by use of patterns. As an example we may declare a
function on strings that adjusts a string to even size by putting a space character in front of
the string if the size is odd. Using the function even on Page 23 and if-then-else for
the splitting into cases gives the following declaration:
let even n = n % 2 = 0;;
val even : int -> bool
adjString "123";;
val it : string = " 123"
adjString "1234";;
val it : string = "1234"
One may, of course, use an if-then-else expression instead of splitting into clauses
by pattern matching. But pattern matching is to be preferred, as illustrated by the following
(less readable) alternative declaration of the gcd function (cf. Page 16):
let rec gcd(m,n) = if m=0 then n
else gcd(n % m,m);;
val gcd : int * int -> int
denotes multiplication on integers (of type int) or multiplication on real numbers (of type
float). The F# system tries to resolve these ambiguities in the following way:
If the type can be inferred from the context, then an overloaded operator symbol is inter-
preted as denoting the function on the inferred type.
If the type cannot be inferred from the context, then an overloaded operator symbol with
a default type will default to this type. The default type is int if the operator can be
applied to integers.
For example, the obvious declaration of a squaring function yields the function on inte-
gers:
let square x = x * x;;
val square : int -> int
Declaring a squaring function on reals can be done either by specifying the type of the
argument:
let square (x:float) = x * x;;
val square : float -> float
abs, acos, atan, atan2, ceil, cos, cosh, exp, floor, log
log10, pow, pown, round, sin, sinh, sqrt, tan, tanh
Table 2.5 Mathematical functions
There are many overloaded operators in F#, in particular mathematical functions that can
be applied to integers as well as to real numbers. Some of them can be found in Table 2.5.
The function abs, for example, computes the absolute value of a number that can be of type
int, float or any of the number types in Table 2.6, for example, float32:
abs -1;;
val it : int = 1
abs -1.0;;
val it : float = 1.0
abs -3.2f;;
val it : float32 = 3.20000000f
2.7 Functions are first-class citizens 31
Overloading is extensively used in the .NET library and typing of arguments is frequently
needed to resolve ambiguities. The user may declare overloaded operators and functions
inside a type declaration as explained in Section 7.3.
The F# system deduces that power has the type: float * int -> float. We can see how
F# is able to infer this type of power by arguing as follows:
1. The keyword function indicates that the type of power is a function type -> , for
some types and .
2. Since power is applied to a pair (x,n) in the declaration, the type must have the form
1 * 2 for some types 1 and 2 .
3. We have 2 = int, since the pattern of the first clause is (x,0), and 0 has type int.
4. We have that = float, since the expression for the function value in the first clause:
1.0 has type float.
5. We know that power(x,n-1) has the type float since = float. Thus, the over-
loaded operator symbol * in x * power(x,n-1) resolves to float multiplication and
x must be of type float. We hence get 1 = float.
The above declaration of the power function has been used for illustrating the declaration
of recursive functions and the type inference performed by the system. As described above
there is already a power operator ** in F# and this should of course be used in programs.
In general we recommend to inspect the F# and .NET libraries and use available library
functions when appropriate.
plusThree 5;;
val it : int = 8
plusThree -7;;
val it : int = -4
The sum of two integers m and n can be computed as ((+) m) n. The brackets can be
omitted because function application associates to the left. For example:
(+) 1 3;;
val it : int = 4
waterWeight 1.0;;
val it : float = 1000.0
waterWeight 2.0;;
val it : float = 8000.0
methanolWeight 1.0;;
val it : float = 786.5
methanolWeight 2.0;;
val it : float = 6292.0
34 Values, operators, expressions and functions
2.8 Closures
A closure gives the means of explaining a value that is a function. A closure is a triple:
where x is an argument identifier, exp is the expression to evaluate to get a function value,
while env is an environment (cf. Section 1.7) giving bindings to be used in such an evalua-
tion.
Consider as an example the evaluation of weight 786.5 in the previous example. The
result is the closure:
ro 786.5
s, ro*s**3.0, * the product function
** the power function
The environment contains bindings of all identifiers in the expression ro*s**3.0 except
the argument s.
The following simple example illustrates the role of the environment in the closure:
let pi = System.Math.PI;;
let circleArea r = pi * r * r;;
val circleArea : float -> float
These declarations bind the identifier pi to a float value and circleArea to a closure:
pi 3.14159 . . .
circleArea (r, pi*r*r, [pi 3.14159 . . .])
A fresh binding of pi does not affect the meaning of circleArea that uses the binding
of pi in the closure:
let pi = 0;;
circleArea 1.0;;
val it : float = 3.141592654
The bracket notation converts from infix or prefix operator to (prefix) function:
The corresponding (prefix) function for an infix operator op is denoted by (op).
The corresponding (prefix) function for a prefix operator op is denoted by (op).
An infix operator is declared using the bracket notation as in the following declaration of
an infix exclusive-or operator .||. on truth values:
let (.||.) p q = (p || q) && not(p && q);;
val ( .||. ) : bool -> bool -> bool
The system determines the precedence and association of declared operators on the basis
of the characters in the operator. In the case of .||. the periods have no influence on this,
so the precedence and association of .||. will be the same as those of ||. Therefore,
true .||. false && true;;
is equivalent to
true .||. (false && true);;
%% 0.5;;
val it : float = 2.0
1 This description of legal operators in F# is incomplete. The precise rules are complicated.
36 Values, operators, expressions and functions
No type containing a function type can support equality as F# has no means to decide
whether two functions are equal: It is a fundamental fact of theoretical computer science
that there exists no (always terminating) algorithm to determine whether two arbitrary pro-
grams f and g (i.e., two closures) denote the same function.
The equality function is automatically extended by F# whenever the user defines a new type
in so far as the type does not contain function types.
The type of the function eqText declared by:
let eqText x y =
if x = y then "equal" else "not equal";;
val eqText : a -> a -> string when a : equality
Ordering
The ordering operators: >, >=, <, and <= are defined on values of basic types and on strings.
They correspond to the usual ordering of numbers. The ordering of characters is given by
the ordering of the Unicode values, while true > false in the ordering of truth values.
2.10 Equality and ordering 37
Strings are ordered in the lexicographical ordering. That is, for two strings s1 and s2 we
have that s1 < s2 if s1 would occur before s2 in a lexicon. For example:
Thus, the empty string precedes the string containing a space character, and the empty string
precedes any other string in the lexicographical ordering. Ordering is automatically extended
by F# whenever the user defines a new type, in so far as the type does not contain functions.
Using the comparison operators one may declare functions on values of an arbitrary type
equipped with an ordering:
when a : comparison
where the precise value of compare x y depends on the structure of the values x and y.
It may be convenient to use pattern matching with guards when declaring functions using
the compare function, for instance:
The guard when t > 0 restricts the matching, while the pattern t would otherwise
match any value.
38 Values, operators, expressions and functions
Hence 33e-8 is a constant of type float and -0x1as is a constant of type int16 while
32f is not accepted by F#.
Each type name denotes an overloaded conversion function converting to a value of the
type in question (in so far as this is possible).
Summary
In this chapter we have described values and functions belonging to the basic F# types: inte-
gers, reals, characters, truth values and strings. Furthermore, we have discussed evaluation
of infix operators with precedences, and the typing of arithmetic expressions where some
operators may be overloaded. The concept of higher-order functions was introduced and the
concept of a closure was used to explain the meaning of a function in F#. It was explained
how to declare operators, and finally, the concepts of equality and ordering were explained.
Exercises
2.1 Declare a function f: int -> bool such that f(n) = true exactly when n is divisible by 2
or divisible by 3 but not divisible by 5. Write down the expected values of f(24), f(27), f(29)
and f(30) and compare with the result. Hint: n is divisible by q when n%q = 0.
2.2 Declare an F# function pow: string * int -> string, where:
pow(s, n) = s s
s
n
2.6 Declare the F# function notDivisible: int * int -> bool where
2.7 1. Declare the F# function test: int * int * int -> bool. The value of test(a, b, c),
for a b, is the truth value of:
notDivisible(a, c)
and notDivisible(a + 1, c)
..
.
and notDivisible(b, c)
2. Declare an F# function prime: int -> bool, where prime(n) = true, if and only if n
is a prime number.
3. Declare an F# function nextPrime: int -> int, where nextPrime(n) is the smallest
prime number > n.
2.8 The following figure gives the first part of Pascals triangle:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
and
n n1 n1
= + if n = 0, k = 0, and n > k.
k k1 k
Declare an F# function bin: int * int -> int to compute binomial coefficients.
2.9 Consider the declaration:
2.11 Declare a function VAT: int -> float -> float such that the value VAT n x is obtained
by increasing x by n percent.
Declare a function unVAT: int -> float -> float such that
unVAT n (VAT n x) = x
Hint: Use the conversion function float to convert an int value to a float value.
2.12 Declare a function min of type (int -> int) -> int. The value of min(f ) is the smallest
natural number n where f (n) = 0 (if it exists).
2.13 The functions curry and uncurry of types
curry : (a * b -> c) -> a -> b -> c
uncurry : (a -> b -> c) -> a * b -> c
are defined in the following way:
uncurry g is the function f where f (x, y) is the value h y for the function h = g x.
Tuples, records and tagged values are compound values obtained by combining values of
other types. Tuples are used in expressing functions of several variables where the argu-
ment is a tuple, and in expressing functions where the result is a tuple. The components in
a record are identified by special identifiers called labels. Tagged values are used when we
group together values of different kinds to form a single set of values. Tuples, records and
tagged values are treated as first-class citizens in F#: They can enter into expressions and
the value of an expression can be a tuple, a record or a tagged value. Functions on tuples,
records or tagged values can be defined by use of patterns.
3.1 Tuples
An ordered collection of n values (v1 , v2 , . . . , vn ), where n > 1, is called an n-tuple.
Examples of n-tuples are:
(10, true);;
val it : int * bool = (10, true)
(("abc",1),-3);;
val it : (string * int) * int = (("abc", 1), -3)
A 2-tuple like (10,true) is also called a pair. The last example shows that a pair, for
example, (("abc",1),-3), can have a component that is again a pair ("abc",1). In
general, tuples can have arbitrary values as components. A 3-tuple is called a triple and a 4-
tuple is called a quadruple. An expression like (true) is not a tuple but just the expression
true enclosed in brackets, so there is no concept of 1-tuple. The symbol () denotes the
only value of type unit (cf. Page 23).
The n-tuple (v1 , v2 , . . . , vn ) represents the graph:
...
v1 v2 vn
43
44 Tuples, records and tagged values
1 -3
(true,"abc",1,-3) ((true,"abc"),1,-3)
while the 3-tuple ((true,"abc"),1,-3) represents the graph with three branches and
a sub-graph with two branches.
Tuple expressions
A tuple expression (expr 1 , expr 2 , . . . , expr n ) is obtained by enclosing n expressions
expr 1 , expr 2 , . . ., expr n in parentheses. It has the type 1 *2 * *n when expr 1 , expr 2 ,
. . . , expr n have types 1 , 2 , . . . , n . For example:
(1<2,"abc",1,1-4) has type bool * string * int * int
(true,"abc") has type bool * string
((2>1,"abc"),3-2,-3) has type (bool * string) * int * int
Remark: The tuple type 1 *2 * *n corresponds to the Cartesian Product
A = A1 A2 An
of n sets A1 , A2 , . . . , An in mathematics. An element a of the set A is a tuple a =
(a1 , a2 , . . . , an ) of elements a1 A1 , a2 A2 , . . . , an An .
A tuple expression (expr 1 , expr 2 , . . . , expr n ) is evaluated from left to right, that is, by
first evaluating expr 1 , then expr 2 , and so on. Tuple expressions can be used in declarations
whereby an identifier is bound to a tuple value, for example:
let tp1 = ((1<2, "abc"), 1, 1-4);;
val tp1 : (bool * string) * int * int = ((true, "abc"), 1, -3)
1 -3
Figure 3.2 Graphs for tuples (true,"abc") and ((true, "abc"), 1, -3)
let t2 = (t1,1,-3);;
val t2 : (bool * string) * int * int = ((true, "abc"), 1, -3)
The value bound to the identifier t1 is then found as a subcomponent of the value bound to
t2 as shown in Figure 3.2. A fresh binding of t1 is, however, not going to affect the value
of t2:
let t1 = -7 > 2;;
val t1 : bool = false
t2;;
val it : (bool * string) * int * int = ((true, "abc"), 1, -3)
The subcomponent (true,abc) is a value in its own right and it depends in no way on possi-
ble future bindings of t1 once the value of the expression (t1,1,-3) has been evaluated.
Equality
Equality is defined for n-tuples of the same type, provided that equality is defined for the
components. The equality is defined componentwise, that is, (v1 , v2 , . . . , vn ) is equal to
(v1 , v2 , . . . , vn ) if vi is equal to vi for 1 i n. This corresponds to equality of the
graphs represented by the tuples. For example:
("abc", 2, 4, 9) = ("ABC", 2, 4, 9);;
val it : bool = false
Ordering
The ordering operators: >, >=, <, and <=, and the compare function are defined on n-
tuples of the same type, provided ordering is defined for the components. Tuples are ordered
lexicographically:
(x1 , x2 , . . . , xn ) < (y1 , y2 , . . . , yn )
exactly when, for some k , where 1 k n, we have:
Tuple patterns
A tuple pattern represents a graph. For example, the pattern (x,n) is a tuple pattern. It
represents the graph shown to the left containing the identifiers x and n:
x n 3 2
The graph represented by the value (3,2) (shown to the right) matches the graph for the
pattern in the sense that the graph for the value is obtained from the graph for the pattern
by substituting suitable values for the identifiers in the pattern in this case the value 3 for
the identifier x and the value 2 for the identifier n. Hence, the pattern matching gives the
bindings x 3 and n 2.
Patterns can be used on the left-hand side in a let declaration which binds the identifiers
in the pattern to the values obtained by the pattern matching, for example:
Patterns may contain constants like the pattern (x,0), for example, containing the con-
stant 0. It matches any pair (v1 , v2 ) where v2 = 0, and the binding x v1 is then obtained:
This example also illustrates that the pattern matching may bind an identifier (here: x) to a
value which is a tuple.
The pattern (x,0) is incomplete in the sense that it just matches pairs where the second
component is 0 and there are other pairs of type *int that do not match the pattern. The
system gives a warning when an incomplete pattern is used:
The warning can be ignored since the second component of ((3,"a"),0) is, in fact, 0.
By contrast the declaration:
generates an error message because the constant 0 in the pattern does not match the cor-
responding value 2 on the right-hand side. The system cannot generate any binding in this
case.
The wildcard pattern can be used in tuple patterns. Every value matches this pattern, but
the matching provides no bindings. For example:
A pattern cannot contain multiple occurrences of the same identifier, so (x,x), for ex-
ample, is an illegal pattern:
3.2 Polymorphism
Consider the function swap interchanging the components of a pair:
let swap (x,y) = (y,x);;
val swap : a * b -> b * a
swap (a,"ab");;
val it : string * char = ("ab", a)
swap ((1,3),("ab",true));;
val it : (string*bool) * (int*int) = (("ab", true), (1, 3))
The examples show that the function applies to all kinds of pairs. This is reflected in the type
of the function: a * b -> b * a.
The type of swap expresses that the argument (type a * b) must be a pair, and that
the value will be a pair (type b * a) such that the first/second component of the value is
of same type as the second/first component of the argument.
The type of swap contains two type variables a and b. A type containing type vari-
ables is called a polymorphic type and a function with polymorphic type like swap is called
a polymorphic function. Polymorphic means of many forms: In our case the F# compiler
is able to generate a single F# function swap working on any kind of pairs and which is
hence capable of handling data of many forms.
Polymorphism is related to overloading (cf. Section 2.5) as we in both cases can apply the
same function name or operator to arguments of different types, but an overloaded operator
denotes different F# functions for different argument types (like + denoting integer addition
when applied to ints and floating-point addition when applied to floats).
There are two predefined, polymorphic functions
fst: a * b -> a and snd: a * b -> b
on pairs, that select the first and second component, respectively. For example:
fst((1,"a",true), "xyz");;
val it : int * string * bool = (1, "a", true)
In the following we will just consider the Cartesian coordinate representation, where a
vector in the plane will be represented by a value of type float * float.
We will consider the following operators on vectors:
Vector addition: (x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 )
Vector reversal: (x, y) = (x, y)
Vector subtraction: (x1 , y1 ) (x2 , y2 ) = (x1 x2 , y1 y2 )
= (x1 , y1 ) + (x2 , y2 )
Multiplication by a scalar: (x1 , y1 ) = (x1 , y1 )
Dot product: (x1 , y1 ) (x2 , y2 ) = x
1 x2 + y1 y2
Norm (length):
(x1 , y1 )
= x21 + y12
We cannot use the operator symbols +,-,*, and so on, to denote the operations on vec-
tors, as this would overwrite their initial meaning. But using +. for vector addition, -. for
vector reversal and subtraction, *. for product by a scalar and &. for dot product, we obtain
operators having a direct resemblance to the mathematical vector operators and having the
associations and precedences that we would expect.
The prefix operator for vector reversal is declared by (cf. Section 2.9):
let (-.) (x:float,y:float) = (-x,-y);;
val ( -. ) : float * float -> float * float
and the infix operators are declared by:
let (+.) (x1, y1) (x2,y2) = (x1+x2,y1+y2): float*float;;
val ( +. ) : float * float -> float * float -> float * float
The norm function is declared using the sqrt function (cf. Table 2.5) by:
let norm(x1:float,y1:float) = sqrt(x1*x1+y1*y1);;
val norm : float * float -> float
These functions allow us to write vector expressions in a form resembling the mathematical
notation for vectors. For example:
let a = (1.0,-2.0);;
val a : float * float = (1.0, -2.0)
let b = (3.0,4.0);;
val b : float * float = (3.0, 4.0)
50 Tuples, records and tagged values
3.4 Records
A record is a generalized tuple where each component is identified by a label instead of the
position in the tuple.
The record type must be declared before a record can be made. We may for example
declare a type person as follows:
The keyword type indicates that this is a type declaration and the braces { and } indicate
a record type. The (distinct) identifiers age, birthday, name and sex are called record
labels and they are considered part of the type.
A value of type Person is entered as follows:
This record contains the following fields: The string John with label name, the integer
29 with label age, the string M with label sex, and the integer pair (2,11) with label
birthday.
The declaration creates the following binding of the identifier john:
john.birthday;;
val it : int * int = (2, 11)
john.sex;;
val it : string = "M"
3.4 Records 51
Record patterns
A record pattern is used to decompose a record into its fields. The pattern
{name = x; age = y; sex = s; birthday =(d,m)}
denotes the graph shown in Figure 3.3. It generates bindings of the identifiers x, y, s, d and
m when matched with a person record:
age - y age - 19
birthday- (d,m) birthday- (24,12)
name - x name - Sue
sex - s sex - F
age john;;
val it : int = 29
isYoungLady john;;
val it : bool = false
isYoungLady sue;;
val it : bool = true
The type of the above functions can be inferred from the context since name, age, and so
on are labels of the record type Person only.
A function:
solve: Equation -> Solution
for computing the solutions of the equation should then have the indicated type. Note that
type declarations like the ones above are useful in program documentation as they commu-
nicate the intention of the program in a succinct way. The system does, however, just treat
the identifiers Equation and Solution as shorthand for the corresponding types.
Error handling
The function solve must give an error message when b2 4ac < 0 or a = 0 as there is no
solution in these cases. Such an error message can be signalled by using an exception. An
exception is named by an exception declaration. We may, for example, name an exception
Solve by the declaration:
exception Solve;;
exception Solve
The then branch of this declaration contains the expression: raise Solve. An evalu-
ation of this expression terminates with an error message. For example:
solve(1.0, 0.0, 1.0);;
FSI_0015+Solve: Exception of type FSI_0015+Solve was thrown.
at FSI_0016.solve(Double a, Double b, Double c)
at <StartupCode$FSI_0017>.$FSI_0017.main@()
Stopped due to error
We say that the exception Solve is raised. Note that the use of the exception does not
influence the type of solve.
Other examples of the use of solve are:
solve(1.0, 1.0, -2.0);;
val it : float * float = (1.0, -2.0)
solve(0.0,1.0,2.0);;
System.Exception: discriminant is negative or a=0.0
at FSI_0037.solve(Double a, Double b, Double c)
at <StartupCode$FSI_0038>.$FSI_0038.main@()
Stopped due to error
We shall on Page 63 study how raised exceptions can be caught.
obtained by binding the parameters a, b and c to the actual values 1.0, 1.0 and 2.0:
1. The binding of sqrtD can only be established when the value of the subexpression has
been evaluated.
2. The evaluation of the subexpression starts with the declaration let b = . . . .
3. The expression b*b-4.0*a*c evaluates to 9.0 using the bindings in env. A binding of
d to this value is added to the environment.
4. The evaluation of if d < 0.0 . . . else sqrt d gives the value 3.0 using the bindings
in the environment env plus d 9.0.
5. The evaluation of the subexpression is completed and the binding of d is removed from
the environment.
6. A binding of sqrtD to the value 3.0 is added to the environment, and the expression
((-b + sqrtD . . . is evaluated in this environment.
7. The bindings of a, b, c and sqrtD are removed and the evaluation terminates with result
(1.0, 2.0).
let d = b*b-4.0*a*c
if d < 0.0 || a = 0.0
then failwith "discriminant is negative or a=0.0"
else sqrt d
and this also ends the lifetime of the binding of d. One says that the let-expression consti-
tutes the scope of the declaration of d.
56 Tuples, records and tagged values
let sqrtD =
let d = b*b-4.0*a*c
if d < 0.0 || a = 0.0
then failwith "discriminant is negative or a=0.0"
else sqrt d
((-b + sqrtD)/(2.0*a),(-b - sqrtD)/(2.0*a))
is terminated by the double semicolon. Note that the expression ((-b + . . . must be on the
same indentation level as let sqrtD =.
A let-expression may contain more than one local declaration as shown in yet another
version of solve (probably the most readable):
let solve(a,b,c) =
let d = b*b-4.0*a*c
if d < 0.0 || a = 0.0
then failwith "discriminant is negative or a=0.0"
else let sqrtD = sqrt d
((-b + sqrtD)/(2.0*a),(-b - sqrtD)/(2.0*a));;
val solve : float * float * float -> float * float
The evaluation of solve(1.0,1.0,-2.0) in this version of the function will add the
binding of d to the environment env. Later the binding of sqrtD is further added with-
out removing the binding of b. Finally the expression in the last line is evaluated and the
bindings of a, b, c, d and sqrtD are all removed at the same time.
Representation. Invariant
We use the representation (a, b), where b > 0 and where the fraction ab is irreducible, that is,
gcd(a, b) = 1, to represent the rational number ab . Thus, a value (a, b) of type int * int
represents a rational number if b > 0 and gcd(a, b) = 1, and we name this condition the
invariant for pairs representing rational numbers. Any rational number has a unique normal
form of this kind. This leads to the type declaration:
type Qnum = int*int;; // (a,b) where b > 0 and gcd(a,b) = 1
where the invariant is stated as a comment to the declaration. (The declaration of gcd is
found on Page 15.)
Operators
It is convenient to declare a function canc that cancels common divisors and thereby re-
duces any fraction with non-zero denominator to the normal form satisfying the invariant:
let canc(p,q) =
let sign = if p*q < 0 then -1 else 1
let ap = abs p
let aq = abs q
let d = gcd(ap,aq)
(sign * (ap / d), aq / d);;
In the below declarations for the other functions, canc is applied to guarantee that the
resulting values satisfy the invariant.
When a rational number is generated from a pair of integers, we must check for division
by zero and enforce that the invariant is established for the result. The function mkQ does
that by the use of canc:
let mkQ = function
| (_,0) -> failwith "Division by zero"
| pr -> canc pr;;
The operators on rational numbers are declared below. These declarations follow the rules
(3.1) for rational numbers. We assume that the arguments are legal representations of rational
numbers, that is, they respect the invariant. Under this assumption, the result of any of the
functions must respect the invariant. This is enforced by the use of canc and mkQ:
let (.+.) (a,b) (c,d) = canc(a*d + b*c, b*d);; // Addition
Note that the definition of equality assumes the invariant. Equality should be declared by
a*d=b*c if we allow integer pairs not satisfying the invariant as there would then be many
different integer pairs representing the same rational number.
It is straightforward to convert a rational number representation to a string:
let toString(p:int,q:int) = (string p) + "/" + (string q);;
as the representation is unique. We can operate on rational numbers in a familiar manner:
let q1 = mkQ(2,-3);;
val q1 : int * int = (-2, 3)
let q2 = mkQ(5,10);;
val q2 : int * int = (1, 2)
let q3 = q1 .+. q2;;
val q3 : int * int = (-1, 6)
In F#, a collection of tagged values is declared by a type declaration. For example, a type
for shapes is declared by:
type Shape = | Circle of float
| Square of float
| Triangle of float*float*float;;
type Shape =
| Circle of float
| Square of float
| Triangle of float * float * float
3.8 Tagged values. Constructors 59
Constructors in patterns
Constructors can be used in patterns. For example, an area function for shapes is declared
by:
let area = function
| Circle r -> System.Math.PI * r * r
| Square a -> a * a
| Triangle(a,b,c) ->
let s = (a + b + c)/2.0
sqrt(s*(s-a)*(s-b)*(s-c));;
val area : Shape -> float
For example, the value Circle 1.2 will match the pattern Circle r, but not the other
patterns in the function declaration. The matching binds the identifier r to the value 1.2,
and the expression Math.pi * r * r is evaluated using this binding:
area (Circle 1.2)
; (Math.PI * r * r, [r 1.2])
; ...
The value Triangle(3.0,4.0,5.0) will in a similar way only match the pattern in
the third clause in the declaration, and we get bindings of a, b and c to 3.0, 4.0 and 5.0,
and the let expression is evaluated using these bindings:
area (Triangle(3.0,4.0,5.0))
; (let s = . . . , [a 3.0, b 4.0, c
5.0])
; ...
does not represent a triangle, as 7.5 > 3.0 + 4.0 and, therefore, one of the triangle inequal-
ities is not satisfied.
Therefore, there is an invariant for this representation of shapes: the real numbers have to
be positive, and the triangle inequalities must be satisfied. This invariant can be declared as
a predicate isShape:
We consider now the declaration of an area function for geometric shapes that raises an
exception when the argument of the function does not satisfy the invariant. If we try to
modify the above area function:
then the else-branch must have means to select the right area-expression depending on the
form of x. This is done using a match ... with ... expression:
let area x =
if not (isShape x)
then failwith "not a legal shape" raise
else match x with
| Circle r -> System.Math.PI * r * r
| Square a -> a * a
| Triangle(a,b,c) ->
let s = (a + b + c)/2.0
sqrt(s*(s-a)*(s-b)*(s-c));;
val area : Shape -> float
The modified area function computes the area of legal values of the type Shape and
terminates the evaluation raising an exception for illegal values:
area (Triangle(3.0,4.0,5.0));;
val it : float = 6.0
area (Triangle(3.0,4.0,7.5));;
System.Exception: not a legal shape
...
62 Tuples, records and tagged values
Types like Colour are called enumeration types, as the declaration of Colour just enu-
merates five constructors:
niceColour Purple;;
val it : bool = false
The days in a month example on Page 4 can be nicely expressed using an enumeration
type:
type Month = January | February | March | April
| May | June | July | August | September
| October | November | December;;
where the order of the constructors reflects that false < true. Notice that user-defined
constructors must start with uppercase letters.
3.10 Exceptions 63
3.10 Exceptions
Exceptions have already been used in several examples earlier in this chapter. In this section
we give a systematic account of this subject.
Raising an exception terminates the evaluation of a call of a function as we have seen for
the solve function on Page 53 that raises the exception Solve when an error situation is
encountered. In the examples presented so far the exception propagates all the way to top
level where an error message is issued.
It is possible to catch an exception using a try. . . with expression as in the following
solveText function:
let solveText eq =
try
string(solve eq)
with
| Solve -> "No solutions";;
val solveText : float * float * float -> string
It calls solve with a float triple eq representing a quadratic equation and returns the string
representation of the solutions of the equation:
solveText (1.0,1.0,-2.0);;
val it : string = "(1, -2)"
An application of the function failwith s will raise the exception Failure s and this
exception can also be caught. Application of the function mkQ (see Page 57), for example,
will call failwith in the case of a division by zero:
try
toString(mkQ(2,0))
with
| Failure s -> s;;
val it : string = "Division by zero"
where None is used as result for arguments where the function is undefined while Some v
is used when the function has value v .
The constructor Some is polymorphic and can be applied to values of any type:
Some false;;
val it : bool option = Some false
optFact -2;;
val it : int option = None
The declaration of optFact presumes that fact has already been declared. An inde-
pendent declaration of optFact is achieved using the Option.get function:
let rec optFact = function
| 0 -> Some 1
| n when n > 0 -> Some(n * Option.get(optFact(n-1)))
| _ -> None;;
val optFact : int -> int option
Note the use of guarded patterns in this declaration (cf. Section 2.10).
Summary
This chapter introduces the notions of tuples and tuple types, the notions of records and
record types, and the notions of tagged values and tagged-value types. Tuples and records are
composite values, and we have introduced the notion of patterns that is used to decompose
a composite value into its parts. Tagged values are used to express disjoint unions.
An operator can be given infix mode and precedence, and this feature was exploited in
writing the operators on geometric vectors in the same way as they are written in mathemat-
ical notation.
The notion of exceptions was introduced for handling errors and let expressions were
introduced for having locally declared identifiers.
66 Tuples, records and tagged values
Exercises
3.1 A time of day can be represented as a triple (hours, minutes, f ) where f is either AM or PM
or as a record. Declare a function to test whether one time of day comes before another. For
example, (11,59,"AM") comes before (1,15,"PM"). Make solutions with triples as well
as with records. Declare the functions in infix notation.
3.2 The former British currency had 12 pence to a shilling and 20 shillings to a pound. Declare
functions to add and subtract two amounts, represented by triples (pounds, shillings, pence) of
integers, and declare the functions when a representation by records is used. Declare the func-
tions in infix notation with proper precedences, and use patterns to obtain readable declarations.
3.3 The set of complex numbers is the set of pairs of real numbers. Complex numbers behave almost
like real numbers if addition and multiplication are defined by:
(a, b) + (c, d) = (a + c, b + d)
(a, b) (c, d) = (ac bd, bc + ad)
1. Declare suitable infix functions for addition and multiplication of complex numbers.
2. The inverse of (a, b) with regard to addition, that is, (a, b), is (a, b), and the inverse of
(a, b) with regard to multiplication, that is, 1/(a, b), is (a/(a2 + b2 ), b/(a2 + b2 )) (provided
that a and b are not both zero). Declare infix functions for subtraction and division of complex
numbers.
3. Use let-expressions in the declaration of the division of complex numbers in order to avoid
repeated evaluation of identical subexpressions.
3.4 A straight line y = ax + b in the plane can be represented by the pair (a, b) of real numbers.
1. Declare a type StraightLine for straight lines.
2. Declare functions to mirror straight lines around the x and y -axes.
3. Declare a function to give a string representation for the equation of a straight line.
3.5 Make a type Solution capturing the three capabilities for roots in a quadratic equation: two
roots, one root and no root (cf. Section 3.5). Declare a corresponding solve function.
3.6 Solve Exercise 3.1 using tagged values to represent AM and PM.
3.7 Give a declaration for the area function on Page 61 using guarded patterns rather than an
if...then...else expression.
4
Lists
Lists are at the core of functional programming. A large number of applications can be mod-
elled and implemented using lists. In this chapter we introduce the list concept, including list
values, patterns and basic operations, and we study a collection of recursion schemas over
lists. We end the chapter introducing a model-based approach to functional programming
on the basis of two examples. The concept of a list is a special case of a collection. In the
next chapter, when we consider collections more generally, we shall see that the F# library
comprises a rich collection of powerful functions on lists.
2 ::
3 :: ::
2 [] 2 []
67
68 Lists
hence a tagged pair with tag :: where the first component, the head of the list, is the integer
2, while the second component, the tail of the list, is the list [3; 2] with just two elements.
This list is again a tagged pair with tag ::, head 3 and tail [2]. Finally the head of the list
[2] is the integer 2, while the tail is the empty list [].
List constants in F#
Lists can be entered as values:
let xs = [2;3;2];;
val xs : int list = [2; 3; 2]
The types int list and string list, containing the type constructor list, indi-
cate that the value of xs is a list of integers and that the value of ys is a list of strings.
We may have lists with any element type, so we can, for example, build lists of pairs:
[("b",2);("c",3);("e",5)];;
val it : (string * int) list = [("b", 2);("c", 3);("e", 5)]
lists of records:
type P = {name: string; age: int}
[{name = "Brown"; age = 25}; {name = "Cook"; age = 45}];;
val it : P list =
[{name = "Brown"; age = 25}; {name = "Cook"; age = 45}]
lists of functions:
[sin; cos];;
val it : (float -> float) list = [<fun:it@7>; <fun:it@7-1>]
Furthermore, lists can be components of other values. We can, for example, have pairs
containing lists:
("bce", [2;3;5]);;
val it : string * int list = ("bce", [2; 3; 5])
int list list means (int list) list. Note that int (list list) would not
make sense.
All elements in a list must have the same type. For example, the following is not a legal
value in F#:
["a";1];;
-----
stdin(8,6): error FS0001:
This expression was expected to have type
string
but here has type
int
Equality of lists
Two lists [x0 ;x1 ; . . . ;xm1 ] and [y0 ;y1 ; . . . ,yn1 ] (of the same type) are equal when
m = n and xi = yi , for all i such that 0 i < m. This corresponds to equality of the
graphs represented by the lists. Hence, the order of the elements as well as repetitions of the
same value are significant in a list.
The equality operator = of F# can be used to test equality of two lists provided that the
elements of the lists are of the same type and provided that the equality operator can be used
on values of that element type.
For example:
[2;3;2] = [2;3];;
val it : bool = false
[2;3;2] = [2;3;3];;
val it : bool = false
The differences are easily recognized from the graphs representing [2; 3; 2], [2; 3]
and [2; 3; 3].
Lists containing functions cannot be compared because F# equality is not defined for func-
tions.
For example:
Ordering of lists
Lists of the same type are ordered lexicographically, provided there is an ordering defined
on the elements:
[x0 ;x1 ; . . . ;xm1 ]<[y0 ;y1 ; . . . ;yn1 ]
exactly when
[x0 ;x1 ; . . . ;xk ] k =m1<n1
and
=[y0 ;y1 ; . . . ;yk ] or k < min{m 1, n 1} and xk+1 < yk+1
for some k , where 0 k < min{m 1, n 1}.
There are two cases in this definition of xs < ys :
1. The list xs is a proper prefix of ys :
[1; 2; 3] < [1; 2; 3; 4];;
val it : bool = true
let y = ""::[];;
val y : string list = [""]
::
x xs
The operator associates to the right, so x0 ::x1 ::xs means x0 ::(x1 ::xs) where x0
and x1 have the same type and xs is a list with elements of that same type (cf. Figure 4.3)
so we get, for example:
let z = 2::3::[4;5];;
val z : int list = [2; 3; 4; 5]
::
x0 ::
x1 xs
List patterns
While the cons operator can be used to construct a list from a (head) element and a (tail)
list, it is also used in list patterns. List patterns and pattern matching for lists are used in the
subsequent sections to declare functions on lists by using the bindings of identifiers in a list
pattern obtained by matching a list to the pattern.
There is the list pattern [] for the empty list while patterns for non-empty lists are con-
structed using the cons operator, that is, x::xs matches a non-empty list.
72 Lists
::
[] x xs
The patterns [] and x::xs denote the graphs in Figure 4.4. The pattern [] matches the
empty list only, while the pattern x::xs matches any non-empty list [x0 ;x1 ;. . . ;xn1 ].
The latter matching gives the bindings x x0 and xs [x1 ;. . . ;xn1 ] of the identifiers
x and xs, as the list [x0 ;x1 ;. . . ;xn1 ] denotes the graph in Figure 4.5.
::
x0 [x1 ;. . . ;xn1 ]
will simultaneously bind x to the value 1 and xs to the value [2;3] by matching the value
[1;2;3] to the pattern x::xs.
A list pattern for a list with a fixed number of elements, for example, three, may be written
as x0::x1::x2::[] or in the shorter form [x0;x1;x2]. This pattern will match any
list with precisely three elements [x0 ;x1 ;x2 ], and the matching binds x0 to x0 , x1 to x1 ,
and x2 to x2 . For example:
let [x0;x1;x2] = [(1,true); (2,false); (3, false)];;
let [x0;x1;x2] = [(1,true); (2,false); (3, false)];;
----
stdin(1,5): warning FS0025: Incomplete pattern matches on this
expression. For example, the value [_;_;_;_] may indicate a
case not covered by the pattern(s).
val x2 : int * bool = (3, false)
val x1 : int * bool = (2, false)
val x0 : int * bool = (1, true)
This generalizes to any fixed number of elements. (The F# compiler issues a warning be-
cause list patterns with a fixed number of elements are in general not recommended, but the
bindings are, nevertheless, made.)
4.2 Construction and decomposition of lists 73
List patterns may have more structure than illustrated above. For example, we can con-
struct list patterns that match lists with two or more elements (e.g., x0::x1::xs), and
list patterns matching only non-empty lists of pairs (e.g., (y1,y2)::ys), and so on. For
example:
We shall see examples of more involved patterns in this chapter and throughout the book.
Note the different roles of the operator symbol :: in patterns and expressions. It denotes
decomposing a list into smaller parts when used in a pattern like x0::x1::xs, and it
denotes building a list from smaller parts in an expression like 0::[1; 2].
[b .. e] [b .. s .. e]
[b; b + 1; b + 2; . . . ; b + n]
where n is chosen such that b + n e < b + n + 1. The range expression generates the
empty list when e < b.
For example, the list of integers from 3 to 5 is generated by:
[ -3 .. 5 ];;
val it : int list = [-3; -2; -1; 0; 1; 2; 3; 4; 5]
and the float-based representation of the list consisting of 0, /2, , 32 , 2 is generated by:
[0.0 .. System.Math.PI/2.0 .. 2.0*System.Math.PI];;
val it : float list =
[0.0; 1.570796327; 3.141592654; 4.71238898; 6.283185307]
In evaluating a function value for suml xs , F# scans the clauses and selects the first clause
where the argument matches the pattern. Hence, the evaluation of suml [1;2] proceeds as
follows:
suml [1;2]
; 1 + suml [2] (x::xs matches [1;2] with x 1 and xs [2])
; 1 + (2 + suml []) (x::xs matches [2] with x 2 and xs [])
; 1 + (2 + 0) (the pattern [] matches the value [])
; 1 + 2
; 3
This example shows that patterns are convenient in order to split up a function declaration
into clauses covering different forms of the argument. In this example, one clause of the
declaration gives the function value for the empty list, and the other clause reduces the
computation of the function value for a non-empty list suml(x::xs) to a simple operation
(addition) on the head x and the value of suml on the tail xs (i.e., suml xs ), where the
length of the argument list has been reduced by one.
It is easy to see that an evaluation for suml [x0 ; . . . ; xn1 ] will terminate, as it contains
precisely n + 1 recursive calls of suml.
The above declaration is an example of a typical recursion schema for the declaration of
functions on lists.
::
:: x0 ::
[] x [] x1 xs
These cases are covered by the patterns in Figure 4.6. Thus, the function can be declared by:
let rec altsum = function
| [] -> 0
| [x] -> x
| x0::x1::xs -> x0 - x1 + altsum xs;;
val altsum : int list -> int
It is left as an exercise to give a declaration for altsum containing only two clauses.
Layered patterns
We want to define a function succPairs such that:
succPairs [] = []
succPairs [x] = []
succPairs [x0 ;x1 ; . . . ;xn1 ] = [(x0 ,x1 );( x1 ,x2 ); . . . ;(xn2 ,xn1 )]
Using the pattern x0::x1::xs as in the above example we get the declaration
let rec succPairs = function
| x0 :: x1 :: xs -> (x0,x1) :: succPairs(x1::xs)
| _ -> [];;
val succPairs : a list -> (a * a) list
This works OK, but we may get a smarter declaration avoiding the cons expression x1::xs
in the recursive call in the following way:
::
x0 :: xs
x1
succPairs [1;2;3];;
val it : (int * int) list = [(1, 2); (2, 3)]
4.3 Typical recursions over lists 77
The pattern x1::_ as xs is an example of a layered pattern. It is part of the pattern shown
in Figure 4.7. A layered pattern has the general form:
pat as id
with pattern pat and identifier id. A value val matches this pattern exactly when the value
matches the pattern pat. The matching binds identifiers in the pattern pat as usual with the
addition that the identifier id is bound to val. Matching the list [x0 ;x1 ; . . . ] with the
pattern x0::(x1::_ as xs) will hence give the following bindings:
x0
x0
x1 x1
xs [x1 ; . . . ]
which is exactly what is needed in this case.
where
(rSum,rProd) = sumProd [x1 ; . . . ;xn1 ]
sumProd [2;5];;
val it : int * int = (7, 10)
Another example is the unzip function that maps a list of pairs to a pair of lists:
unzip [(1,"a");(2,"b")];;
val it : int list * string list = ([1; 2], ["a"; "b"])
mix ([1;2;3],[4;5;6]);;
val it : int list = [1; 4; 2; 5; 3; 6]
4.4 Polymorphism
In this section we will study some general kinds of polymorphism, appearing frequently in
connection with lists. We will do that on the basis of three useful list functions that all can
be declared using the same structure of recursion as shown in Section 4.3.
4.4 Polymorphism 79
List membership
The member function for lists determines whether a value x is equal to one of the elements
in a list [y0 ;y1 ; . . . ;yn1 ], that is:
The function isMember can be useful in certain cases, but it is not included in the F#
library.
The annotation a : equality indicates that a is an equality type variable; see Sec-
tion 2.10. The equality type is inferred from the expression x=y. It implies that the function
isMember will only allow an argument x where the equality operator = is defined for val-
ues of the type of x. A type such as int * (bool * string) list * int list is an
equality type, and the function can be applied to elements of this type.
[x0 ;x1 ; . . . ;xm1 ] @ [y0 ;y1 ; . . . ;yn1 ] = [x0 ;x1 ; . . . ;xm1 ;y0 ;y1 ; . . . ;yn1 ]
These functions are predefined in F#, but their declarations reveal important issues and are
therefore discussed here. The operator @ is actually the infix operator corresponding to the
library function List.append.
The declaration of the (infix) function @ is based on the recursion formula:
[] @ ys = ys
[x0 ;x1 ; . . . ;xm1 ] @ ys = x0 ::([x1 ; . . . ;xm1 ] @ ys)
This leads to the declaration:
The evaluation of append decomposes the left-hand list into its elements, that are after-
wards consed onto the right-hand list:
[1;2]@[3;4]
; 1::([2]@[3;4])
; 1::(2::([]@[3;4]))
; 1::(2::[3;4])
; 1::[2;3;4]
; [1;2;3;4]
The evaluation of xs @ ys comprises m + 1 pattern matches plus m conses where m is the
length of xs .
The notion of polymorphism is very convenient for the programmer because one need not
write a special function for appending, for example, integer lists and another function for
appending lists of integer lists, as the polymorphic append function is capable of both:
[1;2] @ [3;4];;
val it : int list = [1; 2; 3; 4]
[[1];[2;3]] @ [[4]];;
val it : int list list = [[1]; [2; 3]; [4]]
The operators :: and @ have the same precedence (5) and both associate to the right. A
mixture of these operators also associates to the right, so [1]@2::[3], for example, is
interpreted as [1]@(2::[3]), while 1::[2]@[3] is interpreted as 1::([2]@[3]):
[1] @ 2 :: [3];;
val it : int list = [1; 2; 3]
1 :: [2] @ [3];;
val it : int list = [1; 2; 3]
This declaration corresponds directly to the recursion formula for rev: the tail list xs is
reversed and the head element x is inserted at the end of the resulting list but it may be
considered naive as it gives a very inefficient evaluation of the reversed list:
naiveRev[1;2;3]
; naiveRev[2;3] @ [1]
; (naiveRev[3] @ [2]) @ [1]
; ((naiveRev[] @ [3]) @ [2]) @ [1]
; (([] @ [3]) @ [2]) @ [1]
; ([3] @ [2]) @ [1]
; (3::([] @ [2])) @ [1]
; (3::[2]) @ [1]
; [3,2] @ [1]
; 3::([2] @ [1])
; 3 :: (2 :: ([] @ [1]))
; 3 :: (2 :: [1])
; 3 :: [2;1]
; [3;2;1]
We will make a much more efficient declaration of the reverse function in a later chapter
(Page 208). The library function List.rev is, of course, implemented using an efficient
declaration.
while
List.rev [] [] @ []
do not qualify as a value expression as they can be further evaluated. Note that a function
expression (a closure) is considered a value expression because it is only evaluated further
when applied to an argument.
The restriction applies to the expression exp in declarations
let id = exp
and states the following
At top level, polymorphic expressions are allowed only if they are value expres-
sions. Polymorphic expressions can be used freely for intermediate results.
82 Lists
Hence F# allows values of polymorphic types, such as the empty list [], the pair (5,[[]])
or the function (fun x -> [x]):
let z = [];;
val z : a list
(5,[[]]);;
val it : int * a list list = (5, [[]]
List.rev [];;
stdin(86,1): error FS0030: Value restriction.
The value it has been inferred to have generic type
The task is to construct a program that makes a bill of a purchase. For each item the bill
must contain the name of the article, the number of pieces, and the total price, and the bill
must also contain the grand total of the entire purchase.
Article code and article name are central concepts that are named and associated with a
type:
type ArticleCode = string;;
type ArticleName = string;;
where the choice of the string type for ArticleCode is somewhat arbitrary. An alter-
native choice could be the int type.
The register associates article name and article price with each article code, and we model
a register by a list of pairs. Each pair has the form:
(ac, (aname, aprice))
where ac is an article code, aname is an article name, and aprice is an article price. We
choose (non-negative) integers to represent prices (in the smallest currency unit):
type Price = int;; // pr where pr >= 0
and we get the following type for a register:
type Register = (ArticleCode * (ArticleName*Price)) list;;
([(3,"herring",12); (1,"cheese",25)],37)
The function makeBill computes a bill given a purchase and a register and it has the type:
to find the article name and price in the register for a given article code. This will make the
declaration for the function makeBill easier to comprehend. An exception is raised when
no article with the given code occurs in the register:
The declaration of makeBill uses the pattern introduced in Section 4.3 to decompose
the value of the recursive call.
Note that the F# system infers a more general type for the makeBill function than the
type given in our model. This is, however, no problem as the specified type is an instance of
the inferred type makeBill has the specified type (among others).
4.6 Examples. A model-based approach 85
Consider the map in Figure 4.8 with four countries a, b, c, and d, where the coun-
try a has the neighbouring countries b and d, the country b has the neighbouring
country a, and so on. The F# value for this map is given by the declaration of exMap.
"d"
"b"
"a"
"c"
A colour on a map is represented by the set of countries having this colour, and a colouring
is described by a list of mutually disjoint colours:
The countries of the map in Figure 4.8 may hence be coloured by the colouring:
This colouring has two colours ["a";"c"] and ["b"; "d"], where the countries a
and c get one colour, while the countries b and d get another colour.
An overview of the model is shown in Figure 4.9 together with sample values. This figure
also contains meta symbols used for the various types, as this helps in achieving a consistent
naming convention throughout the program.
86 Lists
Meta symbol: Type Definition Sample value
c: Country string "a"
m: Map (Country*Country) list [("a","b");
("c","d");("d","a")]
col: Colour Country list ["a";"c"]
cols: Colouring Colour list [["a";"c"];["b";"d"]]
that can generate a colouring of a given map. We will express this function as a composition
of simple functions, each with a well-understood meaning. These simple functions arise
from the algorithmic idea behind the solutions to the problem. The idea we will pursue here
is the following: We start with the empty colouring, that is, the empty list containing no
colours. Then we will gradually extend the actual colouring by adding one country at a time.
We illustrate this algorithmic idea on the map in Figure 4.8, with the four countries: a,
b, c and d. The four main algorithmic steps (one for each country) are shown in
Figure 4.10. We give a brief comment to each step:
The task is now to make a program where the main concepts of this algorithmic idea are
directly represented. The concepts emphasized in the above discussion are:
We now give a declaration for each of the functions specified in Figure 4.11.
1. First we declare a predicate (i.e., a truth-valued function) areNb to determine for a given
map whether two countries are neighbours:
let areNb m c1 c2 =
isMember (c1,c2) m || isMember (c2,c1) m;;
This declaration makes use of the isMember-function declared in Section 4.4.
2. Next we declare a predicate to determine for a given map whether a colour can be ex-
tended by a country:
let rec canBeExtBy m col c =
match col with
| [] -> true
| c::col -> not(areNb m c c) && canBeExtBy m col c;;
colMap exMap;;
val it : string list list = [["c"; "a"]; ["b"; "d"]]
Comments
In these two examples we have just used types introduced previously in this book, and some
comments could be made concerning the adequacy of the solutions. For example, modelling
a data register by a list of pairs does not capture that each article has a unique description
in the register, and modelling a colour by a list of countries does not capture the property
that the sequence in which countries occur in the list is irrelevant. The same applies to the
property that repeated occurrences of a country in a colour are irrelevant.
In Chapter 5 we shall introduce maps and sets and we shall give more suitable models
and solutions for the two examples above.
Summary
In this chapter we have introduced the notions of lists and list types, and the notion of list
patterns. A selection of typical recursive functions on lists were presented, and the notions
of polymorphic types and values were studied. Furthermore, we have introduced a model-
based approach to functional programming, where important concepts are named and types
are associated with the names.
Exercises 89
Exercises
4.1 Declare function upto: int -> int list such that upto n = [1; 2; . . . ; n].
4.2 Declare function downto1: int -> int list such that the value of downto1 n is the list
[n; n 1; . . . ; 1].
4.3 Declare function evenN: int -> int list such that evenN n generates the list of the first
n non-negative even numbers.
4.4 Give a declaration for altsum (see Page 76) containing just two clauses.
4.5 Declare an F# function rmodd removing the odd-numbered elements from a list:
rmodd [x0 ;x1 ;x2 ;x3 ; . . . ] = [x0 ;x2 ; . . . ]
4.6 Declare an F# function to remove even numbers occurring in an integer list.
4.7 Declare an F# function multiplicity x xs to find the number of times the value x occurs
in the list xs.
4.8 Declare an F# function split such that:
split [x0 ;x1 ;x2 ;x3 ; . . . ;xn1 ] = ([x0 ;x2 ; . . . ], [x1 ;x3 ; . . . ])
4.9 Declare an F# function zip such that:
zip([x0 ;x1 ; . . . ;xn1 ],[y0 ;y1 ; . . . ;yn1 ])
= [(x0 , y0 );(x1 , y1 ); . . . ;(xn1 , yn1 )]
The function should raise an exception if the two lists are not of equal length.
4.10 Declare an F# function prefix: a list -> a list -> bool when a : equality.
The value of the expression prefix [x0 ;x1 ; . . . ;xm ] [y0 ;y1 ; . . . ;yn ] is true if m n
and xi = yi for 0 i m, and false otherwise.
4.11 A list of integers [x0 ;x1 ; . . . ;xn1 ] is weakly ascending if the elements satisfy:
x0 x1 x2 . . . xn2 xn1
or if the list is empty. The problem is now to declare functions on weakly ascending lists.
1. Declare an F# function count: int list * int -> int, where count(xs, x) is the
number of occurrences of the integer x in the weakly ascending list xs .
2. Declare an F# function insert: int list * int -> int list, where the value of
insert(xs, x) is a weakly ascending list obtained by inserting the number x into the weakly
ascending list xs .
3. Declare an F# function intersect: int list * int list -> int list, where the
value of intersect(xs, xs ) is a weakly ascending list containing the common elements
of the weakly ascending lists xs and xs . For instance:
intersect([1;1;1;2;2], [1;1;2;4]) = [1;1;2]
4. Declare an F# function plus: int list * int list -> int list, where the value of
plus(xs, xs ) is a weakly ascending list, that is the union of the weakly ascending lists xs
and xs . For instance:
plus([1;1;2],[1;2;4]) = [1;1;1;2;2;4]
5. Declare an F# function minus: int list * int list -> int list, where the value
of minus(xs, xs ) is a weakly ascending list obtained from the weakly ascending list xs by
removing those elements, that are also found in the weakly ascending list xs . For instance:
minus([1;1;1;2;2],[1;1;2;3]) = [1;2]
minus([1;1;2;3],[1;1;1;2;2]) = [3]
90 Lists
4.12 Declare a function sum(p, xs) where p is a predicate of type int -> bool and xs is a list of
integers. The value of sum(p, xs) is the sum of the elements in xs satisfying the predicate p.
Test the function on different predicates (e.g., p(x) = x > 0).
4.13 Naive sort function:
1. Declare an F# function finding the smallest element in a non-empty integer list.
2. Declare an F# function delete: int * int list -> int list, where the value of
delete(a, xs) is the list obtained by deleting one occurrence of a in xs (if there is one).
3. Declare an F# function that sorts an integer list so that the elements are placed in weakly
ascending order.
Note that there is a much more efficient sort function List.sort in the library.
4.14 Declare a function of type int list -> int option for finding the smallest element in an
integer list.
4.15 Declare an F# function revrev working on a list of lists, that maps a list to the reversed list of
the reversed elements, for example:
Find the types for f, g and h and explain the value of the expressions:
1. f(x, [y0 ,y1 , . . . ,yn1 ]), n 0
2. g[(x0 , y0 ),(x1 , y1 ), . . . ,(xn1 , yn1 )], n 0
3. h[x0 ,x1 , . . . ,xn1 ], n 0
4.17 Consider the declaration:
Find the type for p and explain the value of the expression:
p q [x0 ; x1 ; x3 ; . . . ; xn1 ]
Exercises 91
Functional languages make it easy to express standard recursion patterns in the form of
higher-order functions. A collection of such higher-order functions on lists, for example,
provides a powerful library where many recursive functions can be obtained directly by
application of higher-order library functions. This has two important consequences:
1. The functions in the library correspond to natural abstract concepts and conscious use of
them supports high-level program design, and
2. these functions support code reuse because you can make many functions simply by ap-
plying library functions.
In this chapter we shall study libraries for lists, sets and maps, which are parts of the collec-
tion library of F#. This part of the collection library is studied together since:
It constitutes the immutable part of the collection library. The list, set and map collections
are finite collections programmed in a functional style.
There are many similarities in the corresponding library functions.
This chapter is a natural extension of Chapter 4 since many of the patterns introduced in
that chapter correspond to higher-order functions for lists and since more natural program
designs can be given for the two examples in Section 4.6 using sets and maps.
We will focus on the main concepts and applications in this book, and will deliberately
not cover the complete collection library of F#. The functions of the collection library do
also apply to (mutable) arrays. We address this part in Section 8.10.
5.1 Lists
This section describes the library functions map, various library functions using a predicate
on list elements plus the functions fold and foldBack. Each description aims to provide
the following:
1. An intuitive understanding of the objective of the function.
2. Examples of use of the function.
The actual declarations of the library functions are not considered as we want to concentrate
on how to use these functions in problem solving. Declarations of fold and foldBack
are, however, of considerable theoretical interest and are therefore studied in the last part of
the section. An overview of the List-library functions considered in this section is found
in Table 5.1.
93
94 Collections: Lists, maps and sets
Operation
Meaning
map: (a -> b) -> a list -> b list, where
map f xs = [f (x0 ); f (x1 ); . . . ; f (xn1 )]
exists: (a -> bool) -> a list -> bool, where
exists p xs = x xs.p(x)
forall: (a -> bool) -> a list -> bool, where
forall p xs = x xs.p(x)
tryFind: (a -> bool) -> a list -> a option, where
tryFind p xs is Some x for some x xs with p(x) = true or None if no such x exists
filter: (a -> bool) -> a list -> a list, where
filter p xs = ys where ys is obtained from xs by deletion of elements xi : p(xi ) = false
fold: (a -> b -> a) -> a -> b list -> a, where
fold f a [b0 ; b1 ; . . . ; bn2 ; bn1 ] = f (f (f ( f (f (a, b0 ), b1 ), . . .), bn2 ), bn1 )
foldBack: (a -> b -> b) -> a list -> b -> b, where
foldBack f [a0 ; a1 ; . . . ; an2 ; an1 ] b = f (a0 , f (a1 , f (. . . , f (an2 , f (an1 , b)) )))
collect: (a -> b list) -> a list -> b list, where
collect f [a0 ; a1 ; . . . ; an1 ] = (f a0 )@(f a1 )@ @(f an1 )
These operations are found under the names: List.map, List.exists, and so on.
We assume that xs = [x0 ; x1 ; . . . ; xn2 ; xn1 ].
Table 5.1 A selection of functions from the List library
works as follows:
List.map f [x0 ; x1 ; ; xn1 ] = [f x0 ; f x1 ; ; f xn1 ]
In words:
The function application List.map f is the function that applies the function f
to each element x0 , x1 , . . . , xn1 in a list [x0 ; x1 ; ; xn1 ]
It is easy to use List.map:
The function addFsExt adds the F# file extension .fs to every string in a list of file
names.
The function intPairToRational converts every integer pair in a list to the string of
a rational number on the basis of the declarations in Section 3.7.
The function areaList computes the area of every shape in a list on the basis of the
declarations in Section 3.8.
let addFsExt = List.map (fun s -> s + ".fs");;
val addFsExt : (string list -> string list)
let intPairToRational ps =
List.map (fun p -> toString(mkQ p)) ps;;
where fun p -> toString(mkQ p) is an expansion of the function composition opera-
tor in toString << mkQ and ps is used as explicit list argument in the last declaration.
Explicit list arguments could also be used in declarations of addFsExt and areaList.
is true, if p(xk ) = true holds for all list elements xk , and false otherwise.
is Some xk for a list element xk with p(xk ) = true, or None if no such element exists.
does not terminate if the evaluation of the expression p(xk ) does not terminate for some k ,
where 0 k n 1 and if p(xj ) = false for all j where 1 j < k . A similar remark
will apply to the other functions using a predicate on list elements.
The function isMember (cf. Section 4.4) can be declared using List.exists:
cheese package
Cheeses and packages are considered elements of type cheese and package. A package
may contain zero or more cheeses.
The function
packCheese: package -> cheese -> package
packs an extra cheese into a package:
packCheese * = *
The function List.fold can be applied to the function packCheese, a start package
and a list of cheeses. It uses packCheese to pack the elements of the list (the cheeses) into
the package one after the other starting with the given start package:
210
List.fold packCheese [ 0 ; 1 ; 2 ] =
f = packCheese e= x0 = 0 x1 = 1 x2 = 2
because we can identify the sub-expressions on the right-hand side of the general formula in
our special case:
0 10
f e x0 = f (f e x0 ) x1 =
and
1
f (f (f e x0 ) x1 ) x2 = 2 0
98 Collections: Lists, maps and sets
let sumOfNorms vs =
List.fold (fun s (x,y) -> s + norm(x,y)) 0.0 vs;;
val sumOfNorms : (float * float) list -> float
sumOfNorms vs;;
val it : float = 10.32448591
5.1 Lists 99
length [[1;2];[];[3;5;8];[-2]];;
val it : int = 4
rev [1;2;3];;
val it : int list = [3; 2; 1]
The function List.foldBack is similar to List.fold but the list elements are accu-
mulated in the opposite order. The type of List.foldBack is:
List.foldBack: (a -> b -> b) -> a list -> b -> b
where
cheesePack * = *
012
List.foldBack cheesePack [ 0 ; 1 ; 2 ] =
100 Collections: Lists, maps and sets
g = cheesePack x0 = 0 x1 = 1 x2 = 2 e=
because we can identify the sub-expressions in the right-hand side of the general formula in
our special case:
2 12
g x2 e = g x1 (g x2 e) =
and
012
g x0 (g x1 (g x2 e)) =
The unzip function on Page 77 can be obtained using foldBack with the following data:
List element type a * b
Accumulator type a list * b list
Accumulator function fun (x,y) (xs,ys) -> (x::xs,y::ys)
Start value ([],[])
This gives the declaration
let unzip zs = List.foldBack
(fun (x,y) (xs,ys) -> (x::xs,y::ys))
zs
([],[]);;
val unzip : (a * b) list -> a list * b list
unzip [(1,"a");(2,"b")];;
al it : int list * string list = ([1; 2], ["a"; "b"])
A similar construction using List.fold gives a revUnzip function where the resulting
lists are reversed:
let revUnzip zs =
List.fold (fun (xs,ys) (x,y) -> (x::xs,y::ys)) ([],[]) zs;;
val revUnzip : (a * b) list -> a list * b list
revUnzip [(1,"a");(2,"b")];;
val it : int list * string list = ([2; 1], ["b"; "a"])
The prefix version of an infix operator can be used as argument in fold and foldBack:
List.fold (+) 0 [1; 2; 3];;
val it : int = 6
and we get
List.fold (-) 0 [1;2;3] = ((0 - 1) - 2) - 3 = -6
List.foldBack (-) [1;2;3] 0 = 1 - (2 - (3 - 0)) = 2
Remark
A function declared by means of fold or foldBack will always scan the whole list. Thus,
the following declaration for the exists function
let existsF p =
List.fold (fun b -> (fun x -> p x || b)) false;;
val existsF : (a -> bool) -> (a list -> bool)
will not behave like the function List.exists with regard to non-termination: It will
give a non-terminating evaluation if the list contains any element where the evaluation of the
predicate p does not terminate, while the library function List.exists may terminate in
this case as it does not scan the list further when an element satisfying the predicate has been
found. So it is not considered a good idea to use fold or foldBack to declare functions
like exists or find (cf. Page 95) as these functions need not scan the whole list in all
cases.
where it is possible to decide whether a given value is in the set. For example, Alice is not
in the set {Bob, Bill, Ben} and 7 is in the set {1, 3, 5, 7, 9}, also written:
The above examples are all finite sets; but sets may be infinite and examples are the set of
all natural numbers N = {0, 1, 2, . . .} and the set of all real numbers R.
A set A is a subset of a set B , written A B , if all the elements of A are also elements
of B , for example
Furthermore, two sets A and B are equal, if they are both subsets of each other:
that is, two sets are equal if they contain exactly the same elements.
The subset of a set A that consists of those elements satisfying a predicate p can be
expressed using a set-comprehension {x A | p(x)}. For example, the set {1, 3, 5, 7, 9}
consists of the odd natural numbers that are smaller than 11:
If it is clear from the context from which set A the elements of the set-comprehension
originate, then we use the simplified notation: {x | p(x)}.
5.2 Finite sets 105
A B A B A B
Figure 5.1 Venn diagrams for (a) union, (b) intersection and (c) difference
Some of the standard operations on sets are union: A B , intersection A B and differ-
ence A \ B :
A B = {x | x A or x B}
A B = {x | x A and x B}
A \ B = {x A | x B}
that is, A B is the set of elements that are in at least one of the sets A and B , A B is
the set of elements that are in both A and B , and A \ B is the subset of the elements from
A that are not in B . These operations are illustrated using Venn diagrams in Figure 5.1. For
example:
{Bob, Bill, Ben} {Alice, Bill, Ann} = {Alice, Ann, Bob, Bill, Ben}
{Bob, Bill, Ben} {Alice, Bill, Ann} = {Bill}
{Bob, Bill, Ben} \ {Alice, Bill, Ann} = {Bob, Ben}
Sets in F#
The Set library of F# supports finite sets of elements of a type where ordering is defined,
and provides efficient implementations for a rich collection of set operations. The implemen-
tation is based on a balanced binary tree representation of a set and this is why an ordering
of the elements is required (but we will not consider such implementation details in this
section).
Consider the following example of a set in F#:
Hence, a set can be given in a manner similar to a list using the set-builder function set.
The resulting value is of type Set<string>, that is, a set of strings, and we can see from
the F# answer that the elements occur according to a lexicographical ordering. A standard
number ordering is used for sets of integers, for example:
Operation
Meaning
ofList: a list -> Set<a>, where ofList [a0 ; . . . ; an1 ] = set [a0 ; . . . ; an1 ]
toList: Set<a> -> a list, where toList {a0 , . . . , an1 } = [a0 ; . . . ; an1 ]
add: a -> Set<a> -> Set<a>, where add a A = {a} A
remove: a -> Set<a> -> Set<a>, where remove a A = A \ {a}
contains: a -> Set<a> -> bool, where contains a A = a A
isSubset: Set<a> -> Set<a> -> bool, where isSubset A B = A B
minElement: Set<a> -> a, where
minElement {a0 , a1 , . . . , an2 , an1 } = a0 when n > 0
maxElement: Set<a> -> a, where
maxElement {a0 , a1 , . . . , an2 , an1 } = an1 when n > 0
count: Set<a> -> int, where
count {a0 , a1 , . . . , an2 , an1 } = n
These operations are found under the names: Set.add, Set.contains, and so on.
It is assumed that the enumeration {a0 , a1 , . . . , an2 , an1 } respects the ordering of elements.
Table 5.2 A selection of basic operations from the Set library
Set.toList males;;
val it : string list = ["Ben"; "Bill"; "Bob"]
Note that the resulting list is ordered and contains no repeated elements.
An element can be inserted in a set with the function Set.add:
Set.add "Barry" males;;
val it : Set<string> = set ["Barry"; "Ben"; "Bill"; "Bob"]
and removed from a set with the function Set.remove:
5.2 Finite sets 107
Operation
Meaning
union: Set<a> -> Set<a> -> Set<a>, where union A B = A B
intersect: Set<a> -> Set<a> -> Set<a>, where intersect A B = A B
difference: Set<a> -> Set<a> -> Set<a>, where difference A B = A \ B
filter: (a -> bool) -> Set<a> -> Set<a>, where filter p A = {x A | p(x)}
exists: (a -> bool) -> Set<a> -> bool, where exists p A = x A.p(x)
forall: (a -> bool) -> Set<a> -> bool, where forall p A = x A.p(x)
map: (a -> b) -> Set<a> -> Set<b>, where map f A = {f (x) | x A}
fold: (a -> b -> a) -> a -> Set<b> -> a, where
fold f a {b0 , b1 , . . . , bn2 , bn1 } = f (f (f ( f (f (a, b0 ), b1 ), . . .), bn2 ), bn1 )
foldBack: (a -> b -> b) -> Set<a> -> b -> b, where
foldBack f {a0 , a1 , . . . , an2 , an1 } b = f (a0 , f (a1 , f (. . . , f (an2 , f (an1 , b)) )))
It is assumed that the enumerations in the sets {a0 , a1 , . . . , an2 , an1 } and
{b0 , b1 , . . . , bn2 , bn1 } respect the ordering of the respective types.
Table 5.3 A selection of operations from the Set library
setOfCounts ss;;
val it : Set<int> = set [2; 3]
The functions Set.fold and Set.foldBack also correspond to their list siblings.
This is illustrated in the following evaluations:
Set.fold (-) 0 (set [1;2;3]) = ((0 1) 2) 3 = 6
Set.foldBack (-) (set [1;2;3]) 0 = 1 (2 (3 0)) = 2
where the ordering on the set elements is exploited.
The functions sumSet and setOfCounts can be succinctly declared using foldBack:
let sumSet s = Set.foldBack (+) s 0;;
val sumSet : Set<int> -> int
This function can be declared by repeated extraction of the minimal element from a set until
an element satisfying the predicate is found:
let rec tryFind p s =
if Set.isEmpty s then None
else let minE = Set.minElement s
if p minE then Some minE
else tryFind p (Set.remove minE s);;
For example, the least three-element set from a set of sets is extracted as follows:
let ss = set [set [1;3;5]; set [2;4]; set [7;8;9] ];;
A declaration of this function that is based on Set.fold will always traverse the en-
tire set leading to a linear best-case running time, while the function declared above will
terminate as soon as an element satisfying the predicate is found, and the best-case execu-
tion time is dominated by the time required for finding the minimal element in a set, and
that execution time is logarithmic in the size of the set when it is represented by a balanced
binary tree. Note however, that the worst-case execution time of traversing a set S using
Set.fold or Set.foldBack is O(|S|), that is linear in the size |S| of the set, while it
is O(|S| log(|S|)) for a function based on a recursion schema like that for tryFind, due
to the logarithmic operations for finding and removing the minimal element of a set.
A more efficient implementation of the function tryFind using an enumerator is given
on Page 191, and the efficiency of different methods for traversal of collections is analyzed
in Exercise 9.14. Enumerators for collections (to be introduced in Section 8.12) provide a
far more efficient method than the above used recursion schema for tryFind.
let areNb c1 c2 m =
Set.contains (c1,c2) m || Set.contains (c2,c1) m;;
A colour col can be extended by a country c for a given map m, if for every country c
in col , we have that c and c are not neighbours in m. This can be directly expressed using
Set.forall:
The function
The declaration of this function is based on repeated insertion (using Set.fold) of the
countries in the map into a set:
let countries m =
Set.fold
(fun set (c1,c2) -> Set.add c1 (Set.add c2 set))
Set.empty
m;;
The function
that creates a colouring for a set of countries in a given map, can be declared by repeated
insertion of countries in colourings using the extColouring function:
The function that creates a colouring from a map is declared using function composition
and used as follows:
colMap exMap;;
val it: Set<Set<string>> = set [set ["a";"c"]; set ["b";"d"]]
Comparing this set-based solution with the list-based one in Section 4.6 we can first ob-
serve that the set-based model is more natural, due to the facts that a map is a binary relation
of countries and a colouring is a partitioning of the set of countries in a map. For most of the
functions there is even an efficiency advantage with the set-based functions. This advantage
is due to the following
the worst-case execution time for testing for membership of a set (represented by a bal-
anced binary tree) is logarithmic in the size of the set, while this operation is linear when
the set is represented by a list, and
the worst-case execution time for inserting an element into a set (represented by a bal-
anced binary tree) is logarithmic in the size of the set, while this operation is linear when
the set is represented by a list without duplicated elements.
The use of lists has an advantage in the case of the recursive function extColouring
since the pattern matching for lists yields a more readable declaration and since the worst-
case execution time of this list-based version is linear in the size |S| of the colouring S ,
while it is O(|S| log(|S|)) for the set-based one. (See remark on Page 110.)
An improved version is therefore based on the following type declaration:
Just two functions extColouring and colCntrs are affected by this change of the type
for colouring while the remaining functions are as above. The new declarations are:
colMap exMap;;
val it : Set<string> list = [set ["a"; "c"]; set ["b"; "d"]]
5.3 Maps 113
5.3 Maps
In the modelling and solution for many problems it is often convenient to use finite functions
to uniquely associate values with keys. Such finite functions from keys to values are called
maps. This section introduces the map concept and some of the main operations on maps in
the F# Map library. Please consult the on-line documentation in [9] for an overview of the
complete Map library.
a0 b0
a1 b1
..
.
an1 bn1
An element ai in the set A is called a key for the map m. A pair (ai , bi ) is called an entry,
and bi is called the value for the key ai . Note that the order of the entries is of no significance,
as the map only expresses an association of values to keys. Note also that any two keys ai
and aj in different entries are different, as there is only one value for each key. Thus, a map
may be represented as a finite set of its entries. We use
Operation
Meaning
ofList: (a*b) list -> Map<a,b>, where
ofList [(a0 , b0 ); . . . ; (an1 , bn1 )] = m
toList: Map<a,b> -> (a*b) list, where
toList m = [(a0 , b0 ); . . . ; (an1 , bn1 )]
add: a -> b -> Map<a,b> -> Map<a,b>, where
add a b m = m , where m is obtained by overriding m with the entry (a, b)
containsKey: a -> Map<a,b> -> bool, where containsKey a m = a dom m
find: a -> Map<a,b> -> b, where
find a m = m(a), if a dom m; otherwise an exception is raised
tryFind: a -> Map<a,b> -> b option, where
tryFind a m = Some (m(a)), if a dom m; None otherwise
filter: (a -> b -> bool) -> Map<a,b> -> Map<a,b>, where filter p m
is obtained from m by deletion of entries (ai , bi ) where p ai bi = false
exists: (a -> b -> bool) -> Map<a,b> -> bool, where
exists p m = (a, b) entriesOf(m).p a b
forall: (a -> b -> bool) -> Map<a,b> -> bool, where
forall p A = (a, b) entriesOf(m).p a b
map: (a -> b -> c) -> Map<a,b> -> Map<a,c>, where
map f m = ofList [(a0 , f a0 b0 ); . . . ; (an1 , f an1 bn1 )]
fold: (a -> b -> c -> a) -> a -> Map<b,c> -> a, where
fold f a mbc = f ( (f (f a b0 c0 ) b1 c1 ) . . .) bn1 cn1
foldBack: (a -> b -> c -> c) -> Map<a,b> -> c -> c, where
foldBack f m c = f a0 b0 (f a1 b1 (f . . . (f an1 bn1 c) ))
It is assumed that m and mbc are maps with types Map<a,b> and Map<b,c>, that
entriesOf(m) = {(a0 , b0 ), . . . , (an1 , bn1 )}
entriesOf(mbc ) = {(b0 , c0 ), . . . , (bn1 , cn1 )}
and that the enumerations {a0 , a1 , . . . , an2 , an1 } and {b0 , b1 , . . . , bn2 , bn1 } respect the
ordering of the respective types.
Table 5.4 A selection of operations from the Map library
Maps in F#
The Map library of F# supports maps of polymorphic types Map<a,b>, where a and
b are the types of the keys and values, respectively, of the map. The Map is implemented
using balanced binary trees, and requires therefore that an ordering is defined on the type
a of keys. Some of the functions of the Map library are specified in Table 5.4.
A map in F# can be generated from a list of its entries. For example:
is an F# map for the register reg1 , where keys are strings and values are pairs of the type
5.3 Maps 115
string*int. If the list contains multiple entries for the same key, then the last occurring
entry is the significant one:
Map.toList reg1;;
val it : (string * (string * int)) list =
[("a1", ("cheese", 25)); ("a2", ("herring", 4));
("a3", ("soft drink", 5))]
An entry can be added to a map using add while the value for a key in a map is retrieved
using either find or tryFind:
where find raises an exception if the key is not in the domain of the map and tryFind
returns None in that case.
The old entry is overridden if you add an entry for an already existing key. The entry for
a given key can be deleted using the remove function:
The Map functions exists, forall, map, fold and foldBack are similar to their
List and Set siblings. These functions are specified with type and meaning in Table 5.4,
so we just give some illustrative examples below.
The following expression tests whether there are expensive articles, for which the price
exceeds 100, in a register:
Map.exists (fun _ (_,p) -> p > 100) reg1;;
val it : bool = false
The natural requirement that every price occurring in a register must be positive is expressed
by:
Map.forall (fun _ (_,p) -> p > 0) reg1;;
val it : bool = true
The part of a register with articles having a price smaller than 7 is extracted as follows:
Map.filter (fun _ (_,p) -> p < 7) reg3;;
val it : Map<string,(string * int)> =
map [("a2", ("herring", 4)); ("a3", ("soft drink", 5))]
A new register, where a 15% discount is given on all articles, can be computed as follows:
Map.map
(fun ac (an,p) -> (an,int(round(0.85*(float p)))))
reg3;;
val it : Map<string,(string * int)> =
map [("a1", ("cheese", 21)); ("a2", ("herring", 3));
("a3", ("soft drink", 4)); ("a4", ("bread", 7))]
We can extract the list of article codes and prices for a given register using the fold functions
for maps:
Map.foldBack (fun ac (_,p) cps -> (ac,p)::cps) reg1 [];;
val it: (string*int) list = [("a1",25); ("a2",4); ("a3",5)]
where these two examples show that the entries of a map are ordered according to the keys.
The natural model of a register, associating article name and price with each article code,
is using a map:
type Register = Map<ArticleCode, ArticleName*Price>;;
Version 1
In the first version we model a purchase just as in Section 4.6:
type Item = NoPieces * ArticleCode;;
type Purchase = Item list;;
The function makebill1: Register -> Purchase -> Bill makes the bill for a
given register and purchase and it can be defined by a recursion following the structure of a
purchase:
let rec makeBill1 reg = function
| [] -> ([],0)
| (np,ac)::pur ->
match Map.tryFind ac reg with
| Some(aname,aprice) ->
let tprice = np*aprice
let (infos,sumbill) = makeBill1 reg pur
((np,aname,tprice)::infos, tprice+sumbill)
| None ->
failwith(ac + " is an unknown article code");;
where an exception signals an undefined article code in a register. We use the function
Map.tryFind in order to detect when this exception should be raised. A simple appli-
cation of the program is:
let pur = [(3,"a2"); (1,"a1")];;
Version 2
The recursion pattern of makeBill1 is the same as that of List.foldBack, and the
explicit recursion can be replaced by application of that function. Furthermore, it may be
acceptable to use the exception from the Map library instead of using failwith. This
leads to the following declaration:
let makeBill2 reg pur =
let f (np,ac) (infos,billprice) =
let (aname, aprice) = Map.find ac reg
let tprice = np*aprice
((np,aname,tprice)::infos, tprice+billprice)
List.foldBack f pur ([],0);;
Version 3
A purchase is so far just modelled as a list of items, each item consisting of a count and an ar-
ticle code. The order of appearance in the list may represent the sequence in which items are
placed on the counter in the shop. One may, however, argue that a purchase of the following
three items: three herrings, one piece of cheese, and two herrings, is the same as a purchase
of one piece of cheese and five herrings. Furthermore, the latter form is more convenient if
we have to model a discount on five herrings, as the discount applies independently of the
order in which the items are placed on the counter. Thus one could model a purchase as a
map, where article codes are keys and number of pieces are values of a map.
type Purchase = Map<ArticleCode,NoPieces>;;
With this model, the makeBill3: Register -> Purchase -> Bill function is
declared and used as follows:
let makeBill3 reg pur =
let f ac np (infos,billprice) =
let (aname, aprice) = Map.find ac reg
let tprice = np*aprice
((np,aname,tprice)::infos, tprice+billprice)
Map.foldBack f pur ([],0);;
where we use Map.foldBack to fold the function f over a purchase.
An example showing the use of this function is:
let purMap = Map.ofList [("a2",3); ("a1",1)];;
val purMap : Map<string,int> = map [("a1", 1); ("a2", 3)]
We leave the generation of a map for a purchase on the basis of a list of items for Exer-
cise 5.9. Furthermore, it is left for Exercise 5.10 to take discounts for certain articles into
account.
Summary
In this chapter we have introduced the list, set and map parts from the collection library of
F#. These three libraries are efficient implementations of such finite, immutable collections.
Notice that this chapter just covers a small part of the libraries. Furthermore, in many ap-
plications these collections provide a natural data model and we strongly encourage to use
these libraries whenever it is appropriate.
In Chapter 11 we introduce sequences, which is another part of the collection library.
Sequences are (possibly infinite) list-like structures, where just a finite part of the sequence
is computed at any stage of a computation.
Exercises
5.1 Give a declaration for List.filter using List.foldBack.
5.2 Solve Exercise 4.15 using List.fold or List.foldBack.
5.3 Solve Exercise 4.12 using List.fold or List.foldBack.
5.4 Declare a function downto1 such that:
downto1 f n e = f (1, f (2, . . . , f (n1, f (n, e)) . . .)) for n > 0
downto1 f n e = e for n 0
5. The relation composition r s of a relation r from a set A to a set B and a relation s from
B to a set C is a relation from A to C . It is defined as the set of pairs (a, c) where there exist
an element b in B such that (a, b) r and (b, c) s. Declare an F# function to compute the
relational composition.
6. A relation r from a set A to the same set A is said to be transitive if (a1 , a2 ) r and
(a2 , a3 ) r implies (a1 , a3 ) r for any elements a1 , a2 and a3 in A. The transitive closure
of a relation r is the smallest transitive relation containing r. If r contains n elements, then
the transitive closure can be computed as the union of the following n relations:
r (r r) (r r r) (r r r)
Finite trees
This chapter is about trees, which are structures that may contain subcomponents of the
same type. A list is an example of a tree. The list 1::[2;3;4], for example, contains a
subcomponent [2;3;4] that is also a list. In this chapter we will introduce the concept of
a tree through a variety of examples.
In F# we use a recursive type declaration to represent a set of values which are trees. The
constructors of the type correspond to the rules for building trees, and patterns containing
constructors are used when declaring functions on trees.
We motivate the use of finite trees and recursive types by a number of examples: Chi-
nese boxes, symbolic differentiation, expression trees, search trees, file systems, trees with
different kinds of nodes and electrical circuits.
Cube
r c cb
is also in Cbox.
Rule 3: The set Cbox contains no other values than the trees generated by repeated use of
Rule 1 and Rule 2.
121
122 Finite trees
The following example shows how this definition can be used to generate elements of
Cbox.
Cube
Cube
Cube
Type declaration
Using the following type Colour:
The declaration is recursive, because the declared type Cbox occurs in the argument type of
the constructor Cube. The constructors Nothing and Cube correspond to the above rules
1 and 2 for generating trees, so we can redo the above steps a through d with values of type
Cbox:
Cube(2.0,Yellow,Cube(1.0,Green,Cube(0.5,Red,Nothing)))
Patterns
In Section 3.8 we have seen declarations containing patterns for tagged values. Constructors
for trees can occur in patterns just like constructors for tagged values. An example of a tree
pattern is Cube(r,c,cb), containing identifiers r, c and cb for the components. This
pattern denotes the tree in Figure 6.1.
This pattern will, for example, match the tree shown in Figure 6.2 corresponding to the
value Cube(1.0,Green,Cube(0.5,Red,Nothing)) with bindings
r 1.0
c Green
cb Cube(0.5,Red,Nothing)
124 Finite trees
Cube
r c cb
Cube
where cb is bound to a value of type cbox corresponding to the tree shown in Step b on
Page 122.
The inductive definition of the trees implies that any tree will either match the empty tree
corresponding to the pattern:
Nothing
according to Rule 1 in the definition of trees, or the tree pattern for a cube in Figure 6.1
corresponding to:
Cube(r,c,cb)
Function declarations
We give a declaration of the function:
count: Cbox -> int
such that the value of the expression: count(cb) is the number of cubes of the Chinese
box cb:
let rec count = function
| Nothing -> 0
| Cube(r,c,cb) -> 1 + count cb;;
val count : Cbox -> int
The declaration divides into two cases, one with pattern Nothing and the other with pattern
Cube(r,c,cb). Thus, the declaration follows the inductive definition of Chinese boxes.
6.1 Chinese boxes 125
This function can be applied to the above values cb2 and cb3:
count cb2 + count cb3;;
val it : int = 5
Cube
When declaring a function on Chinese boxes by the use of the type Cbox we must ensure
that the function respects the invariant, that is, the function will only compute values of type
Cbox satisfying the invariant when applied to values satisfying the invariant.
Insertion function
We can declare an insertion function on Chinese boxes:
insert: float * Colour * Cbox -> Cbox
The value of the expression insert(r,c,cb) is the Chinese box obtained from cb by
inserting an extra cube with side length r and colour c at the proper place among the cubes
in the box. The function insert is a partial function, that raises an exception in case the
insertion would violate the invariant for Chinese boxes:
let rec insert(r,c,cb) =
if r <= 0.0 then failwith "ChineseBox"
else match cb with
| Nothing -> Cube(r,c,Nothing)
| Cube(r1,c1,cb1) ->
match compare r r1 with
| t when t > 0 -> Cube(r,c,cb)
| 0 -> failwith "ChineseBox"
| _ -> Cube(r1,c1,insert(r,c,cb1));;
126 Finite trees
insert(2.0,Yellow,insert(1.0,Green,Nothing));;
val it : Cbox = Cube (2.0,Yellow,Cube (1.0,Green,Nothing))
insert(1.0,Green,insert(2.0,Yellow,Nothing));;
val it : Cbox = Cube (2.0,Yellow,Cube (1.0,Green,Nothing))
insert(1.0,Green,Cube(2.0,Yellow,Cube(1.0,Green,Nothing)));;
System.Exception: ChineseBox
Stopped due to error
Note, that any legal Chinese box can be generated from the box Nothing by repeated use
of insert.
This is, however, essentially the same as the above Cbox type of trees, as the list type is
a special case of the general concept of recursive types (cf. Section 6.3).
One may also argue that it is strange to have a constructor Nothing denoting a non-
existing Chinese box, and one might rather discard the empty box and divide the Chinese
boxes into those consisting of a single cube and those consisting of multiple cubes, as ex-
pressed in the following declaration:
type Cbox1 = | Single of float * Colour
| Multiple of float * Colour * Cbox1;;
Using this type, we get the following declarations of the functions count and insert:
let rec count1 = function
| Single _ -> 1
| Multiple(_,_,cb) -> 1 + count1 cb;;
We have now suggested several representations for Chinese boxes. The preferable choice
of representation will in general depend on which functions we have to define. The clumsy
declaration of the insert1 function contains repeated sub-expressions and this indicates
that the first model for Chinese boxes with a Nothing value is to be preferred.
f 2.0;;
val it : float = -0.7568024953
Sin Mul
Mul Sin X
X X X
The different order of the operators in these expressions is reflected in the trees: the tree
for sin(x x) contains a sub-tree for the sub-expression x x, which again contains two
sub-trees for the sub-expressions x and x, while the tree for (sin x) x contains sub-trees
for the sub-expressions sin x and x.
128 Finite trees
The set of finite expression trees Fexpr is generated inductively by the following rules:
Rule 1: For every float number r, the tree for the constant r shown in Figure 6.5 is a member
of Fexpr.
r fe 1 fe 2 fe 1 fe 2 fe 1 fe 2 fe 1 fe 2
Figure 6.5 Tree for the constant r and trees for dyadic operators
fe fe fe fe
Rule 5: The set Fexpr contains no other values than the trees generated by rules 1. to 4.
Type declaration
Expression trees can be represented in F# by values of a recursively defined type. We get the
following declaration of the type Fexpr:
For instance, the expression trees for sin(x x) and (sin x) x are represented by the
values Sin(Mul(X,X)) and Mul(Sin X,X) of type Fexpr.
6.2 Symbolic differentiation 129
Patterns
The following patterns correspond to values of type Fexpr:
Const r X
Add(fe1,fe2) Sub(fe1,fe2) Mul(fe1,fe2) Div(fe1,fe2)
Sin fe Cos fe Log fe Exp fe
These patterns can be used in function declarations with a division into clauses according to
the structure of expression trees.
Function declaration
We are now in a position to declare a function
D: Fexpr -> Fexpr
such that D(fe) is a representation of the derivative with respect to x of the function rep-
resented by fe . The declaration for D has a clause for each constructor generating a value
of type Fexpr, and each clause is a direct translation of the corresponding mathematical
differentiation rule (see Tables 6.1 and 6.2):
D(Sin(Mul(X, X)));;
val it : Fexpr =
Mul (Cos (Mul (X,X)),
Add (Mul (Const 1.0,X),Mul (X,Const 1.0)))
Note, that these examples show results which can be reduced. For example, the above
value of D(Mul(Const 3.0, Exp X)) could be reduced to Mul(Const 3.0, Exp X)
if a product with a zero factor was reduced to zero, and if adding zero or multiplying by
one was absorbed. It is an interesting, non-trivial, task to declare a function that reduces
expressions to a particular, simple form.
toString(Mul(Cos(Mul(X, X)),
Add(Mul(Const 1.0, X), Mul(X, Const 1.0))));;
val it : string =
"(cos((x) * (x))) * (((1) * (x)) + ((x) * (1)))"
The function toString puts brackets around every operand of an operator and every ar-
gument of a function. It is possible to declare a better toString function that avoids
unnecessary brackets. See Exercise 6.3.
Node
Node ab Leaf
Leaf cd Leaf 3
1 2
Node Leaf
Leaf cd Leaf 3
1 2
Node Leaf
t1 x t2 x
Leaf x
Node(t1,x,t2)
corresponding to the pattern trees in Figure 6.9. Using this we may, for example, declare a
function depth computing the depth of a binary tree:
depth t1;;
val it : int = 2
"ab"
"cd" 3
1 2
In the following we will often use simplified drawings of trees where the constructors
have been left out and replaced by the value attached to the node. Such a simplified drawing
of the tree t1 in Figure 6.7 is shown in Figure 6.10.
6.4 Traversal of binary trees. Search trees 133
0 0 7
-3 2 -3 2
Simplified drawings of the corresponding trees without constructors are shown in Figure 6.11.
We will use such simplified drawings in the following.
134 Finite trees
inOrder t4;;
val it : int list = [-3; 0; 2; 5; 7]
postOrder t4;;
val it : int list = [-3; 2; 0; 7; 5]
The reader should compare these lists with the figure and the oral descriptions of the traversal
functions.
Traversal of binary trees can more generally be described by fold and foldBack func-
tions defined such that the following holds:
preFold f e t = List.fold f e (preOrder t)
preFoldBack f t e = List.foldBack f (preOrder t) e
and similar for in-order and post-order traversals. These functions should be declared to
accumulate the values in the nodes while traversing the tree without actually building the
list. We show one of the declarations:
let rec postFoldBack f t e =
match t with
| Leaf -> e
| Node(tl,x,tr) ->
let ex = f x e
let er = postFoldBack f tr ex
postFoldBack f tl er;;
val postFoldBack : (a -> b -> b) -> BinTree<a> -> b -> b
Search trees
We restrict the type variable a in our BinTree type to types with an ordering:
type BinTree<a when a : comparison> =
| Leaf
| Node of BinTree<a> * a * BinTree<a>;;
6.4 Traversal of binary trees. Search trees 135
A value of type BinTree<a> is then called a search tree if it satisfies the following
condition:
Every node Node(tleft , a, tright ) satisfies:
a < a for every value a occurring in tleft and
a > a for every value a occurring in tright .
This condition is called the search tree invariant. The trees t3 and t4 defined above and
shown in Figure 6.11 satisfy this invariant and are hence search trees.
A search tree can be used to represent a finite set {a0 , a1 , . . . , an1 }. This representation
is particularly efficient when the tree is balanced (see discussion on Page 136).
A function add for adding a value to a search tree can be defined as follows:
let rec add x t =
match t with
| Leaf -> Node(Leaf,x,Leaf)
| Node(tl,a,tr) when x<a -> Node(add x tl,a,tr)
| Node(tl,a,tr) when x>a -> Node(tl,a,add x tr)
| _ -> t;;
val add: a -> BinTree<a> -> BinTree<a> when a: comparison
It builds a single-node tree when adding a value x to an empty tree. When adding to a non-
empty tree with root a the value is added to the left sub-tree if x < a and to the right sub-tree
if x > a. The tree is left unchanged if x = a because the value x is then already member
of the represented set.
Adding the value 4 to the search tree t4
let t5 = add 4 t4;;
val t5 : BinTree<int> =
Node
(Node(Node(Leaf,-3,Leaf),0,Node(Leaf,2,Node(Leaf,4,Leaf))),
5,Node(Leaf,7,Leaf))
0 7
-3 2
It follows by an inductive argument that an in-order traversal of a search tree will visit
the elements in ascending order because the elements in the left sub-tree are smaller than the
root element while the elements in the right sub-tree are larger and this applies inductively
to any sub-tree. We get for instance:
inOrder t5;;
val it : int list = [-3; 0; 2; 4; 5; 7]
136 Finite trees
An in-order traversal of a search tree will hence give a list where the elements in the nodes
occur in ascending order.
A function contains for testing set membership can be declared by:
contains 4 t5;;
val it : bool = true
It uses the search tree property in only testing the left sub-tree if x < the root node value and
only the right sub-tree if x > the root node value. The number of comparisons made when
evaluating a function value: contains x t is hence less or equal to the depth of the tree t.
It follows that the tree t5 in Figure 6.12 is not an optimal representation of the set, because
the set can be represented by the tree of depth 2 in Figure 6.13. The tree t5 was created by
the above add function, and it would hence require a more sophisticated add function to
get the balanced tree in Figure 6.13 instead.
0 5
-3 2 7
The number of nodes in a balanced tree with depth k is approximately 2k and the depth of
a balanced tree with n nodes is hence approximately log2 n. The Set and Map collections
in the F# library use balanced search trees to get efficient implementations. A function like
Set.contains will hence require circa log2 n comparisons when used on a set with n
elements. Searching a value (e.g., using List.exists) in a list of length n may require
up to n comparisons. That makes a big difference for large n (e.g., log2 n 20 when
n = 1000000).
6.5 Expression trees 137
can now be defined recursively by dividing into cases according to the structure of the tree:
138 Finite trees
We may, for example, evaluate the above representation et of an expression in the environ-
ment env where the identifier "a" is bound to the value -7:
let env = Map.add "a" -7 Map.empty;;
eval et env;;
val it : int = 35
2 3 4
5 6 7
The reader should appreciate this short and elegant combination of library functions.
The declaration of depthFirstFoldBack is left as an exercise to the reader (cf. Ex-
ercise 6.12).
In the breadth-first order we should first visit the root and then the roots of the immediate
sub-trees and so on. This view of the problem does, unfortunately, not lead to any useful
recursion because the remaining part becomes organized in an inconvenient list of lists of
sub-trees.
A nice recursive pattern is instead obtained by constantly keeping track of the list rest of
sub-trees where the nodes still remain to be visited. Using this idea on the tree in Figure 6.14
we get:
Visit rest
1 [t2; t3; t4]
2 [t3; t4; t5]
3 [t4; t5]
4 [t5; t6; t7]
... ...
140 Finite trees
let breadthFirstFoldBack f t e =
breadthFirstFoldBackList f [t] e;;
val breadthFirstFoldBack :
(a -> b -> b) -> ListTree<a> -> b -> b
The directory d1 contains two files a1 and a4 and two directories d2 and d3 . The directory
d2 contains a file a2 and a directory d3 , and so on. Note that the same name may occur in
different directories. This structure is an example of a tree with variable number of sub-trees.
d1
a1 d2 a4 d3
a2 d3 a5
a3
Discarding the contents of files we represent a file system and its contents by two decla-
rations:
type FileSys = Element list
and Element = | File of string
| Dir of string * FileSys;;
The first declaration refers to a type Element which is declared in the second declara-
tion. This forward reference to the type Element is allowed by the F# system because
Element is declared in the second declaration using the keyword and. These two decla-
rations constitute an example of mutually recursive type declarations, as the type Element
occurs in the declaration of FileSys and the type FileSys occurs in the declaration of
Element.
The directory shown in Figure 6.15 is represented by the value:
let d1 =
Dir("d1",[File "a1";
Dir("d2", [File "a2"; Dir("d3", [File "a3"])]);
File "a4";
Dir("d3", [File "a5"])
]);;
The above function declarations are mutually recursive as the identifier namesElement
occurs in the declaration of namesFileSys while the identifier namesFileSys occurs
in the declaration of namesElement. Mutually recursive functions are declared using the
keyword and to combine the individual function declarations.
The names of file and directories in the directory d1 may now be extracted:
namesElement d1;;
val it : string list = ["d1"; "a1"; "d2"; "a2";
"d3"; "a3"; "a4"; "d3"; "a5"]
Ser
1.0
Par Comp
1.5
0.25
Comp Comp 1.5
0.25 1.0
For example:
count cmp;;
val it : int = 3
We consider now circuits consisting of resistances where the attached values are the re-
sistances of the individual components. Suppose c1 and c2 are two circuits with resistances
r1 and r2 , respectively. The resistance of a serial combination of c1 and c2 is r1 + r2 , and
the resistance of a parallel combination of c1 and c2 is given by the formula:
1
1/r1 + 1/r2
Thus, a function resistance computing the resistance of a circuit can be declared by:
let rec resistance = function
| Comp r -> r
| Ser(c1,c2) -> resistance c1 + resistance c2
| Par(c1,c2) ->
1.0 / (1.0/resistance c1 + 1.0/resistance c2);;
val resistance : Circuit<float> -> float
For example:
resistance cmp;;
val it : float = 1.7
Tree recursion
The functions count and resistance on circuits can be expressed using a generic
higher-order function circRec for traversing circuits. This function must be parameter-
ized with three functions c, s and p, where
c : a -> b The value for a single component.
s : b -> b -> b The combined value for two circuits connected in series.
p : b -> b -> b The combined value for two circuits connected in parallel.
Note that s and p have the type b -> b -> b because they operate on the values for
two circuits. Thus, a general higher-order recursion function for circuits will have the type:
(a -> b) * (b -> b -> b) * (b -> b -> b)
-> Circuit<a> -> b
The function circRec can, for example, be used to compute the number of components
in a circuit by use of the following functions c, s, and p:
let count circ = circRec((fun _ -> 1), (+), (+)) circ : int;;
val count : Circuit<a> -> int
Suppose again that the value attached to every component in a circuit is the resistance
of the component. Then the function circRec can be used to compute the resistance of a
circuit by use of the following functions c, s, and p:
c is fun r -> r
The attached value is the resistance.
s is (+)
The resistance of a serial composition is the sum of the resistances.
let resistance =
circRec(
(fun r -> r),
(+),
(fun r1 r2 -> 1.0/(1.0/r1 + 1.0/r2)));;
val resistance : (Circuit<float> -> float)
Summary
We have introduced the notion of finite trees and motivated this concept through a variety
of examples. In F# a recursive type declaration is used to represent a set of values which
are trees. The constructors of the type correspond to the rules for building trees, and patterns
containing constructors are used when declaring functions on trees. We have also introduced
the notions of parameterized types, and mutually recursive type and function declarations.
Exercises 145
Exercises
6.1 Declare a function red of type Fexpr -> Fexpr to reduce expressions generated from the
differentiation program in Section 6.2. For example, sub-expressions of form Const 1.0 * e
can be reduced to e. (A solution is satisfactory if the expression becomes nicer. It is difficult
to design a reduce function so that all trivial sub-expressions are eliminated.)
6.2 Postfix form is a particular representation of arithmetic expressions where each operator is
preceded by its operand(s), for example:
(x + 7.0) has postx form x 7.0 +
(x + 7.0) (x 5.0) has postx form x 7.0 + x 5.0
Declare an F# function with type Fexpr -> string computing the textual, postfix form of
expression trees from Section 6.2.
6.3 Make a refined version of the toString function on Page 130 using the following conven-
tions: A subtrahend, factor or dividend must be in brackets if it is an addition or subtraction.
A divisor must be in brackets if it is an addition, subtraction, multiplication or division. The
argument of a function must be in brackets unless it is a constant or the variable x. (Hint: use a
set of mutually recursive declarations.)
6.4 Consider binary trees of type BinTree<a,b> as defined in Section 6.3. Declare functions
1. leafVals: BinTree<a,b> -> Set<a> such that leafVals t is the set of values
occurring the leaves of t,
2. nodeVals: BinTree<a,b> -> Set<b> such that nodeVals t is the set of values
occurring the nodes of t, and
3. vals: BinTree<a,b> -> Set<a>*Set<b> such that vals t = (ls, ns), where
ls is the set of values occurring the leaves of t and ns is the set of values occurring the nodes
of t
6.5 An ancestor tree contains the name of a person and of some of the ancestors of this person. We
define the type AncTree by:
(p q) (p) (q)
(p q) (p) (q)
p (q r) (p q) (p r)
(p q) r (p r) (q r)
4. A proposition is a tautology if it has truth value true for any assignment of truth values to the
atoms. A disjunction of literals is a tautology exactly when it contains the atom as well as
the negated atom for some name occurring in the disjunction. A conjunction is a tautology
precisely when each conjunct is a tautology. Write a tautology checker in F#, that is, an F#
function which determines whether a proposition is a tautology or not.
6.8 We consider a simple calculator with instructions for addition, subtraction, multiplication and
division of floats, and the functions: sin, cos, log and exp.
The execution of ADD with stack a b c yields a new stack: (b + a) c , where the top
two elements a and b on the stack have been replaced by the single element (b + a). Similarly
with regard to the instructions, SUB, MULT and DIV, which all work on the top two elements
of the stack.
The execution of one of the instructions SIN, COS, LOG and EXP applies the correspond-
ing function to the top element of the stack. For example, the execution of LOG with stack
a b c yields the new stack: log(a) b c .
The execution of PUSH r with the stack a b c pushes r on top of the stack, that is, the
new stack is: r a b c .
1. Declare a type Stack for representing the stack, and declare an F# function to interpret the
execution of a single instruction:
3. Declare an F# function
trans: Fexpr * float -> Instruction list
where Fexpr is the type for expression trees declared in Section 6.2. The value of the ex-
pression trans(fe, x) is a program prg such that intpProg(prg) gives the float value of
fe when X has the value x. Hint: The instruction list can be obtained from the postfix form of
the expression. (See Exercise 6.2.)
6.9 A company consists of departments with sub-departments, which again can have sub-departments,
and so on. (The company can also be considered as a department.)
1. Assume that each department has a name and a (possibly empty) list of sub-departments.
Declare an F# type Department.
2. Extend this type so that each department has its own gross income.
3. Declare a function to extract a list of pairs (department name, gross income), for all depart-
ments.
4. Declare a function to extract the total income for a given department by adding up its gross
income, including the income of its sub-departments.
5. Declare a function to extract a list of pairs (department name, total income) for all depart-
ments.
6. Declare a function format of type Department -> string, which can be used to get a
textual form of a department such that names of sub-departments will occur suitably indented
(e.g., with four spaces) on separate lines. (Use printf to print out the result. Do not use
printf in the declaration of format.)
6.10 Consider expression trees of type ExprTree declared in Section 6.5. Extend the type with an
if-then-else expression of the form: if b then e1 else e2 , where b is a boolean expression
and e1 and e2 are expressions. An example could be:
Modules
Throughout the book we have used programs from the F# core library and from the .NET
library, and we have seen that programs from these libraries are reused in many different
applications. In this chapter we show how the user can make own libraries by means of
modules consisting of signature and implementation files. The implementation file contains
the declarations of the entities in the library while the signature file specifies the users
interface to the library.
Overloaded operators are defined by adding augmentations to type definitions. Type aug-
mentations can also be used to customize the equality and comparison operators and the
string conversion function. Libraries with polymorphic types are obtained by using sig-
natures containing type variables.
These features of the module system are illustrated by small examples: plane geometric
vectors and queues of values with arbitrary type. The last part of the chapter illustrates the
module system by a larger example of piecewise linear plane curves. The curve library is
used to describe the recursively defined family of Hilbert curves, and these curves are shown
in a window using the .NET library. The theme of an exercise is a picture library used to
describe families of recursively defined pictures like Eschers fishes.
7.1 Abstractions
A key concept in designing a good program library is abstraction: the library must provide
a service where a user can get a general understanding of what a library function is doing
without being forced to learn details about how this function is implemented. The interface
to any standard library, like, for example, the Set library, is based on useful abstractions,
in this case the (mathematical) concept of a set. Based on a general understanding of this
concept you may use the Set library while still being able to focus your main attention on
other aspects of your program.
Modules are the technical means of dividing a programming problem into smaller parts,
but this process must be guided by useful abstractions. Creation of useful abstractions is the
basis for obtaining a successful modular program design.
Abstractions are described on the semantical level by:
149
150 Modules
The above example of the Set library comes with a natural language explanation of the
concept of sets and of operations on sets plus certain special sets like the empty set and
singleton sets (cf. Section 5.2). Throughout the book we follow this style in describing the
semantics, as seen from a user, of the library in two parts:
type TypeName . . .
val Name : type
They are part of an F# program and found in the signature file of a module. They specify
entities to be represented and implemented in the library.
A type specification without type expression
type TypeName
hides the structure of the type from the user and this structure is then only found in the
implementation of the library. The users access to the library is restricted to use values and
functions according to the specifications in the signature and the user cannot look into details
of the representation of values of this type or make such values by hand. This feature of
the interface to a library is called data hiding. It gives the means for protecting the integrity
of representations of values such that, for example, invariants are not violated.
type Vector
together with specifications of the functions. This gives the following signature of a module
with module name Vector:
The implementation must contain a definition of the type Vector and declarations of all
values specified in the signature. The type Vector specified as hidden in the signature must
be a tagged type or a record type, so we have to add a tag, say V, in the type definition:
open Vector;;
let a = make(1.0,-2.0);;
val a : Vector
let b = make(3.0,4.0);;
val b : Vector
152 Modules
coord c;;
val it : float * float = (-1.0, -8.0)
module Vector
[<Sealed>]
type Vector =
static member ( - ) : Vector -> Vector
static member ( + ) : Vector * Vector -> Vector
static member ( - ) : Vector * Vector -> Vector
static member ( * ) : float * Vector -> Vector
static member ( * ) : Vector * Vector -> float
val make : float * float -> Vector
val coord: Vector -> float * float
val norm : Vector -> float
module Vector
type Vector =
| V of float * float
static member (-) (V(x,y)) = V(-x,-y)
static member (+) (V(x1,y1),V(x2,y2)) = V(x1+x2,y1+y2)
static member (-) (V(x1,y1),V(x2,y2)) = V(x1-x2,y1-y2)
static member (*) (a, V(x,y)) = V(a*x,a*y)
static member (*) (V(x1,y1),V(x2,y2)) = x1*x2 + y1*y2
let make(x,y) = V(x,y)
let coord(V(x,y)) = (x,y)
let norm(V(x,y)) = sqrt(x*x + y*y)
The member declarations cannot be intermixed with let declarations (but local let dec-
larations are, of course, allowed inside the expressions in the declarations).
The functions make, coord and norm are specified and implemented as usual F# func-
tions the OO-features should only be used to obtain an effect (here: operators) that cannot
be obtained using normal F# style.
Note that the implementation file in Table 7.2 compiles without signature file if the
module declaration is commented out. This is often convenient during the implementa-
tion and test of a module. Furthermore, the output from the interactive F# compiler in such
a compilation can be useful in getting details in the signature correct.
Note the following:
The following are examples of use of the Vector library specified in Table 7.1:
let a = Vector.make(1.0,-2.0);;
val a : Vector.Vector
let b = Vector.make(3.0,4.0);;
val b : Vector.Vector
7.4 Type extension 155
Vector.coord c;;
val it : float * float = (-1.0, -8.0)
let d = c * a;;
val d : float = 15.0
Vector.coord g;;
val it : float * float = (4.0, 2.0)
as shown in Table 7.3. This implementation compiles with the signature in Table 7.1 and has
the same effect as the implementation in Table 7.2, but it offers the possibility of inserting
usual function declarations between the type definition and the member declarations like
make and coord in Table 7.3. Such functions can be used in the member declarations
and that may sometime allow simplifications. This possibility is used later in the example of
plane curves in Section 7.9.
module Vector
type Vector = V of float * float
let make(x,y) = V(x,y)
let coord(V(x,y)) = (x,y)
type Vector with
static member (-) (V(x,y)) = V(-x,-y)
static member (+) (V(x1,y1),V(x2,y2)) = V(x1+x2,y1+y2)
static member (-) (V(x1,y1),V(x2,y2)) = V(x1-x2,y1-y2)
static member (*) (a, V(x,y)) = V(a*x,a*y)
static member (*) (V(x1,y1),V(x2,y2)) = x1*x2 + y1*y2
let norm(V(x,y)) = sqrt(x*x + y*y)
The constructor ObjVector initializes the members x and y using the parameter values X
and Y. The following show some uses of the class:
let a = ObjVector(1.0,-2.0);;
val a : CbjVector
b.coord();;
val it : float * float = (3.0, 4.0)
c.coord();;
val it : float * float = (-1.0, -8.0)
b.x;;
val it : float = 3.0
7.6 Parameterized modules. Type variables in signatures 157
let d = c * a;;
val d : float = 15.0
let e = b.norm();;
val e : float = 5.0
g.coord();;
val it : float * float = (4.0,2.0)
illustrates the use of named arguments where arguments in a function call are identified by
name instead of position in the argument list. Named arguments can make call of functions
from the .NET library more readable as the meaning of each argument is visible from the
context, while the meaning of an argument can otherwise only be found by studying the
documentation of the function in question.
The example of plane curves uses a similar feature called optional property setting
(cf. Section 7.9).
Note that members coord, norm and x are written as a suffix to the values c and b. They
are in the OO-world considered as belonging to the values c and b. Using fun-expressions
they determine functions
Using OO-style constructs is daily life for the F# programmer as the .NET library is 100
percent OO, and the OO-features of F# give a quite streamlined access to this library. An
object member is used as argument of a higher-order function by packaging it into a fun-
expression as shown above.
The implementation uses an interesting data representation due to L.C. Paulson (cf. [10],
Chapter 7) where a queue is represented by two lists, a front list containing the first queue
elements in the order of insertion and a rear list containing the remaining queue elements
in the reverse order of insertion. The representation of a queue containing values 1, 2, 3 may
hence look as follows:
front [1]
rear [3; 2]
Using put to insert a value, say 4, will simply cons the value onto the rear list:
front [1]
rear [4; 3; 2]
while get removes the heading element 1 from the front list:
front []
rear [4; 3; 2]
A call of get in this situation with empty front list will reverse the rear list to get the
list [2; 3; 4] with the queue elements in the order of insertion. This list is then used as
front list while the rear list becomes empty.
front [3; 4] (the front element 2 has been removed by get)
rear []
The implementation module in Table 7.5 uses this idea and represents a Queue value as a
record {front:a list; rear:a list} containing the two lists. Note that the repre-
sentation of a queue is not unique because different pairs of front and rear lists may represent
the same queue.
module Queue
type Queue<a>
val empty : Queue<a>
val put : a -> Queue<a> -> Queue<a>
val get : Queue<a> -> a * Queue<a>
exception EmptyQueue
module Queue
exception EmptyQueue
type Queue<a> = {front: a list; rear: a list}
let empty = {front = []; rear = []}
let put y {front = xs; rear = ys} = {front = xs; rear = y::ys}
let rec get = function
| {front = x::xs; rear = ys} ->
(x,{front = xs; rear = ys})
| {front = []; rear = []} -> raise EmptyQueue
| {front = []; rear = ys} ->
get {front = List.rev ys; rear = []}
module Queue
exception EmptyQueue
[<CustomEquality;NoComparison>]
type Queue<a when a : equality> =
{front: a list; rear: a list}
member q.list() = q.front @ (List.rev q.rear)
override q1.Equals qobj =
match qobj with
| :? Queue<a> as q2 -> q1.list() = q2.list()
| _ -> false
override q.GetHashCode() = hash (q.list())
override q.ToString() = string (q.list())
Declarations of empty, put and get are as in Table 7.5.
In signature: type Queue<a when a : equality>
Table 7.6 Type definition with augmentation for equality, hashing and string
It is possible to override the default equality operator using a type augmentation as shown
in Table 7.6. The signature in Table 7.4 needs an equality constraint on the type variable a
of queue elements as the Equals function uses equality for a list values:
:? Queue<a> as q2 -> . . .
It expresses a match on type. The value of qobj matches the pattern if the type of qobj
matches the type Queue<a> in the pattern, that is, if the type of qobj is an instance of
this type.The identifier q2 is then bound to the value of qobj.
Note the following:
The customized equality compares single lists containing all queue elements. This list
q.list() is obtained from the used representation {front=xs; rear=ys} of a
queue as the front list q.front with the reversed of the rear list q.rear appended.
The overriding cannot be given in a separate type extension. There are hence no possibil-
ity of declaring a local function to be used in the member-declarations. The frequently
used expression q.front @ (List.rev q.rear) is therefore defined as a member
function q.list().
The compiler gives a warning if the hash function is not customized because values con-
sidered equal should have same hash code. This condition becomes critical if the imper-
ative collections HashSet or Directory (cf. Section 8.11) are used with elements of
type Queue.
Overriding ToString gives a reasonable conversion of a queue to a string by using
string on q.list().
7.8 Customizing ordering and indexing 161
Applying the new Queue module with customized comparison and string function to
the example in Section 7.6 with declarations of q0,q1,. . . ,q6 and s we now get:
qnew = q3;;
val it : bool = true
string q2;;
val it : string = "[1; 2]"
The indexing is expressed by the get part of an Item member function. The implemen-
tation uses list indexing in the list of queue elements in insertion order. The signature must
contain the corresponding specification:
member Item : int -> a with get
[<Sealed>]
type Queue<a when a : comparison> =
interface System.IComparable
member Item : int -> a with get
Signature of Queue with ordering and indexing: type part
[<CustomEquality;CustomComparison>]
type Queue<a when a : comparison> =
{front: a list; rear: a list}
member q.list() = q.front @ (List.rev q.rear)
interface System.IComparable with
member q1.CompareTo qobj =
match qobj with
| :? Queue<a> as q2 -> compare (q1.list()) (q2.list())
| _ ->
invalidArg "qobj"
"cannot compare values of different types"
member q.Item
with get n = (q.list()).[n]
Implementation of Queue with ordering and indexing
Table 7.7 Type augmentation for ordering and indexing in queue module
162 Modules
let q0 = Queue.empty;;
let q1 = Queue.put 1 q0;;
let q2 = Queue.put 2 q1;;
q2 > q1 ;;
val it : bool = true
q2.[1] ;;
val it : int = 2
Syntax Function
point(x, y) The curve consisting of the single point with coordinates (x, y)
c1 + c2 The curve consisting of the curve c1 , the segment from the end point of c1
to the start point of c2 and the curve c2 .
a*c The curve obtained from c by multiplication with factor a from the start
point of c
c | a The curve obtained by rotating c the angle a (in degrees) around its start
point
c --> (x, y) The curve obtained from c by the parallel translation in the plane moving
the start point of c to the point with coordinates (x, y)
c >< a The curve obtained from c by horizontal reflection in the vertical line with
equation x = a
verticRefl c b The curve obtained from c by vertical reflection in the horizontal line with
equation y = b
boundingBox c The pair ((xmin , ymin ), (xmax , ymax )) of coordinates of lower left and
upper right corner of the bounding box of the curve c
width c The width of the bounding box of c
height c The height of the bounding box of c
toList c The list [(x0 , y0 ); (x1 , y1 );. . . (xn1 , yn1 )] of coordinates of the curve
points P0 , P1 ; . . . ; Pn1
Table 7.8 Operations on curves
7.9 Example: Piecewise linear plane curves 163
y 6
P3 P4 P5
P2 P1
P0
-x
Use of the infix operators | for the rotate function is overloaded to also allow integer
angle values. The infix operators allow Curve expressions to be written using a minimum
of parentheses.
module Curve
[<Sealed>]
type Curve =
static member ( + ) : Curve * Curve -> Curve
static member ( * ) : float * Curve -> Curve
static member ( |) : Curve * float -> Curve
static member ( |) : Curve * int -> Curve
static member (-->) : Curve * (float * float) -> Curve
static member (><) : Curve * float -> Curve
val point : float * float -> Curve
val verticRefl : Curve -> float -> Curve
val boundingBox : Curve -> (float * float) * (float * float)
val width : Curve -> float
val height : Curve -> float
val toList : Curve -> (float * float) list
such that
hn+1 = hilbert hn for n = 0, 1, 2, . . .
164 Modules
c2 c3
P3
P2 - -
P4
6 6
?
c1 c4
6
- P1
h0 h1 h2 h3
Studying Figure 7.1 we note that c2 and c3 can be obtained from hn by parallel trans-
lations while c1 and c4 must be obtained from a mirror image of hn . The following figure
shows the curve c0 obtained by horizontal reflection of hn in the vertical line through the
start point (0.0, 0.0) and the curves obtained from c0 by rotations through 90 and 90 :
y y y y
6 6 6 6
-x
?
- -x - x 6 -x
hn c0 = hn >< 0.0 c0 | 90 c0 | 90
w = Curve.width hn
h = Curve.height hn
P1 : (0.0, 0.0)
P2 : (0.0, w + 1.0)
P3 : (h + 1.0, w + 1.0)
P4 : (h + h + 1.0, w)
Note that the height and width of c1 and c4 are the width and height, respectively, of hn . The
height of hn is actually equal to its width.
let hilbert hn =
let w = Curve.width hn
let h = Curve.height hn
let c0 = hn >< 0.0
let c1 = c0 | -90
let c2 = hn --> (0.0, w + 1.0)
let c3 = hn --> (h + 1.0, w + 1.0)
let c4 = (c0 | 90) --> (h + h + 1.0, w)
c1 + c2 + c3 + c4;;
val hilbert : Curve.Curve -> Curve.Curve
Note that the programming of the hilbert function has been done using geometric con-
cepts only. We do not need any knowledge about the implementation of the Curve library.
Displaying curves
We want to make a function to display a curve in a window using the .NET library. Before
getting to the programming we have to make some geometric considerations.
The display is made in a panel belonging to a window. The panel uses Cartesian coordi-
nates where the y -axis points downwards and the upper left corner of the panel has panel
coordinates (0, 0). The situation is depicted in Figure 7.2. The thick box is the panel with
width pw and height ph. The picture shows that a curve point with coordinates (x, y) has
panel coordinates:
xpanel = x
()
ypanel = ph y
The program uses two libraries:
y
6 - xpanel
6 ypanel
?
6
ph
y
- x
-
pw
?
ypanel
open System.Drawing
open System.Windows.Forms
Application.EnableVisualStyles();;
The function display declared in Table 7.10 has two parameters: the title to be
written on top of the window and a triple comprising the Curve to be displayed plus width
pw and height ph of the panel. The function consists of five parts:
1. The function f converts a set of coordinates (x, y) to a Point object containing the
corresponding panel coordinates. The panel coordinates are integers and the conversion
from float to int consists of a round followed by an int conversion. The formula
() on Page 165 is used in converting to panel coordinates.
The list clst of coordinates of points on the curve is extracted and the function f is
applied to each element to get the corresponding list Ptlst of Point objects. Finally
the corresponding array pArr of Point objects is made. It is ready to be used by the
Graphics member function DrawLines.
2. A Pen object pen is created and a function draw drawing the curve on a Graphics
object is declared. It calls DrawLines using pen and the array pArr of curve point
coordinates.
3. The Panel object is created and configured to fill all of the window (DockStyle). The
draw function is added to the panels collection of Paint objects
4. The window (Form) is created using the specified title. The size is set and scrolling is
enabled. The value of AutoScrollMinSize is set to allow the window to scroll to any
part of the panel, and scrolling is activated. Finally, the panel is added to the collection of
Controls of the window.
5. The window is shown (win.Show()).
7.9 Example: Piecewise linear plane curves 167
A window is a live object that handles a number of events. The actual window has only
events corresponding to manipulation of the window like: resizing of window, use of scroll-
bars, window comes in to foreground. The part of the panel inside the window is then re-
drawn using the function in the Paint collection of the panel. The parameter e is actually
an Event object.
Some of the above calls of constructors use optional property setting like the argument
Dock=DockStyle.Fill in the argument list of constructor Panel. This constructor
has actually no Dock argument. The specified value DockStyle.Fill is instead used as
initial value of the Dock property of the created Panel object.
A curve requires some adjustment before the display function can be used: the curve
must be suitable scaled to get a proper size of the details of the curve, and the curve must
be moved away from the boundary of the panel as boundary points are invisible. This job is
done by the function adjust. It multiplies the curve c by the factor a and makes a parallel
translation of the curve to leave a blank band of 10 pixels in the panel around the curve:
let adjust(c:Curve.Curve, a: float) =
let c1 = a * c --> (10.0, 10.0)
let (_,(maxX,maxY)) = Curve.boundingBox c1
let pw = int(round maxX) + 20
let ph = int(round maxY) + 20
(c1,pw,ph);;
The value of adjust can be used directly as second parameter in the display function.
The displayed curve has been scaled by a factor 10.0 to get a reasonable drawing.
containing the coordinates of the start point of the curve plus the (possibly empty) list of
coordinates of the remaining points. Any value of this type represents a curve (there is no
invariant) and the functions can hence be implemented without any error case where an
exception should be raised.
module Curve
type Curve = C of (float*float) * ((float*float) list)
Summary
We have introduced the notions of module, signature and implementation concepts that
are needed when a programmer makes his own libraries. Moreover, we have introduced the
notion of type augmentation and shown how it can be used to declare overloaded operators
and to customize the equality and comparison operations and the string conversion.
Exercises
7.1 Make an implementation file of the vector example in this section using a record type:
type Vector = {x: float; y: float}
while using the same signature file.
7.2 Make signature and implementation files for a library of complex numbers with overloaded
arithmetic operators (cf. Exercise 3.3).
7.3 Make signature and implementation files for a library of multi-sets of integers represented by
weakly ascending lists (cf. Exercise 4.11).
7.4 Make signature and implementation files for a library of polynomials with integer coefficients
(cf. Exercise 4.22).
7.5 Customize the string function in the library of polynomials in Exercise 7.4.
7.6 Make an indexing in the library of multi-sets of integers in Exercise 7.3 such that the value of
s.[n] is the number of occurences of n in the multi-set s.
7.7 Make an indexing in the library of polynomials in Exercise 7.4 such that p.[n] is the coffecient
to xn in the polynomium p.
7.8 The Sierpinski curves s0 , s1 , s2 , ... are a system of curves, where the curve sn+1 is obtained
by joining four curves which are obtained from the curve sn by transformations composed of
reflections, rotations and translations.
6
s0 s1 s2
The figure shows the Sierpinski curves s0 , s1 and s2 and how each of the curves s1 and s2 is
obtained by joining four curves. All vertical and horizontally segments in a Sierpinski curve
have length 1 and all curves s0 , s1 , . . . start in the origin. Use the Curve library to declare the
function sierpinski that computes the curve sn+1 from the curve sn for any n = 0, 1, . . ..
Use this function to display the curve s4 in a window.
7.9 The Peano curves p0 , p1 , p2 , ... are a system of curves, where the curve pn+1 is obtained by join-
ing 9 curves which are obtained from the curve pn by transformations composed of reflections,
rotations and translations.
The figure shows the Peano curves p0 , p1 og p2 and how each of the curves p1 and p2 is obtained
by joining 9 curves. All Peano curves start in the origin and the joining segments (thin lines in
Exercises 171
?
6 6
?
6 6
?
6 6
p0 p1 p2
the Figure) are of length 1. Use the Curve library to declare the function peano that computes
the curve pn+1 from the curve pn for any n = 0, 1, . . . (in getting from pn to pn+1 it may be
convenient to group the 9 curves into 3 groups each consisting of 3 curves and first build the
curve for each of these 3 groups). Use the peano function to make a program display the curve
p4 in a window.
7.10 Add a minus operator of type Curve -> Curve to the Curve library. It should compute the
reversed curve, that is, - c should contain the same point as c but taken in the opposite order.
7.11 Make a library for manipulation of pictures (following ideas due to Henderson, cf. [6]). A
picture is a set of segments together with a rectangular, upright bounding box in the plane. The
bounding box is not shown when drawing a picture but it is used when defining operations on
pictures. We use usual rectangular, Cartesian coordinates in the plane, so points in the plane are
represented by coordinates which are pairs (x, y) of float numbers. The point with coordinates
(0.0,0.0) is called the origin of the plane. A picture is normally placed in the coordinate
system such that the bounding box is situated in the lower left corner of the first quadrant.
If c is a float number with c > 0.0 then a picture can be scaled by factor c by mapping each
point (x, y) to the point (c*x, c*y). The scaled picture will have width c*a and height c*b
where a and b are width and height of the original picture. Scaling is used in some of the below
operations in order to adjust the width or the height of a picture.
The library should contain the following functions on pictures:
Grid: Computes a picture directly from width and height of the bounding box and the coordi-
nates of the pairs of end-points of the segments in the picture. The function should be declared
such that all the numbers in the input are integers (the function must convert to float numbers
as used in the value representing a picture).
Rotate: Computes the picture p obtained from the picture p by first rotating 90 in the positive
(counter-clockwise) direction around the origin and then translating the resulting picture to the
right to get its lower left-hand corner into the origin. The height of p will be the width of p and
the width of p will be the height of p.
Flip: Computes the picture obtained from a picture by horizontal reflection around the vertical
line through the middle of the bounding box.
Beside: Computes the picture obtained from two pictures p1 and p2 by uniting p1 with a version
of p2 that has been placed to the right of p1 and scaled to the same height.
172 Modules
Above: Computes the picture obtained from two pictures p1 and p2 by uniting p2 with a version
of p1 that has been placed on top of p2 and scaled to the same width.
Row: Computes the picture obtained by placing n copies of a picture p beside each other.
Column: Computes the picture obtained by placing n copies of a picture p on top of each other.
Coordinates: Computes the pair ((width, height), segmentList) where width and height are
width and height of a picture while segmentList is a list of coordinate pairs ((x, y), (x , y )) of
end-points of the segments in the picture.
You should chose your own names of the functions and use operators whenever appropri-
ate. Furthermore, you should implement a function to display a picture in a window. (Hint:
DrawLine(Pen,Point1 ,Point2 ) draws a segment.)
The library should be used to construct pictures of persons and Eschers fishes as described
in the following.
Persons
20
15
10
0
0 5 10
The starting point is the picture man shown in Figure 7.3. It has width 14 and heigh 20. Using
the functions on pictures you should now make programs to construct the pictures couple and
crowd shown in Figure 7.3.
Eschers fishes
The starting point of Eschers fishes is the four (16 16) pictures p, q, r, and s shown in
Figure 7.4. By combining these four pictures we get the picture t in Figure 7.5, while the picture
a is obtained by combining suitably rotated copies of q. Finally the picture b1 is obtained by
combining two suitably rotated copies of t.
The Escher fish pictures e0, e1 and e2 are now obtained by combining the pictures in Fig-
ure 7.5 as shown in Figure 7.6. The pictures b2, b3 and b4 are obtained from b1 by successive
rotations. The transition from an Escher picture to the next adds a border around the picture
consisting of a picture a in each corner, a row of b1s at the top, a column of b2s at the left,
a row of b3s at the bottom, and a column of b4s at the right. In this border there will be one
b1 on top of an a and two b1s on top of a b1, one b2 to the left of an a and two b2s to the
left of a b2, etc.
You should make a program to generate the Escher fish pictures e0, e1 and e2.
Exercises 173
15 15 15 15
10 10 10 10
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
a b1 b1 b1 b1 a
a b1 a b2 b4
b2 b4
a b2 e0 b4 e1
b2 b4
a b3 a b2 b4
a b3 b3 b3 b3 a
Imperative features
Imperative features are language constructs to express commands using and modifying the
state of mutable objects. The working of these features are explained using a new compo-
nent, the store, beside the environment. A store consists of a set of locations containing
values, together with operators to access and change the store. We explain why the restric-
tion on polymorphically typed expressions is needed because of the imperative features. The
F# concept of a mutable record field gives the means of assigning values to object members.
The while loop is introduced and we study its relationship with iterative functions. The F#
and .NET libraries offer imperative collections: arrays, imperative sets and imperative maps.
8.1 Locations
A (mutable) location is a part of the computer memory where the F# program may store dif-
ferent values at different points in time. A location is obtained by executing a let mutable
declaration, for example:
let mutable x = 1;;
val mutable x : int = 1
The keyword mutable requests F# to create a location, and the answer tells that x is
bound to a location of type int, currently containing the value 1. The situation obtained is
illustrated as follows:
x loc 1 loc 1 : 1
175
176 Imperative features
f <- cos;;
val it : unit = ()
but the value assigned to a location must have the same type as the location, so the following
attempt fails:
x <- (2,3);;
x <- (2,3);;
------
...: error FS0001: This expression was expected to
have type
int
but here has type
a * b
The contentsOf operator has no visible operator symbol and the operator is automatically
inserted by the F# compiler according to the following coercion rule:
Assume that we have the binding of x and corresponding location loc 1 as above:
x loc 1 loc 1 : 7
let t = x;;
val t : int = 8
where the right-hand side x would otherwise evaluate to a location. The identifier t is hence
bound to the value 8:
x loc 1 loc 1 : 8
t 8
x <- 17;;
val it : unit = ()
t ;;
val it : int = 8
The coercion rule also apply if we enter the identifier x as an expression to be evaluated by
F# because this is interpreted as a declaration let it = x of the identifier it:
x;;
val it : int = 17
Hence, a location is not a value but an expression may evaluate to a location when, for
example, used as the left-hand side of an assignment.
Locations cannot occur as components in tuples or tagged values, as elements in lists or
as values of functions. The situation for records is different as described in Section 8.5.
might hence have been more appropriate as the declaration creates a mutable location to
be bound to the identifier x. The actual F# syntax has the advantage that the coercion rule
automatically applies when the right-hand side evaluates to a location like in the above
example let t = x where x is bound to a location but t becomes bound to a value.
8.4 Sequential composition 179
Unchecked.defaultof<type>
on the right-hand side of the declaration. Such a value is available for any type. It may serve
as a placeholder in the location until replaced by a proper value. Default values should be
used only for this purpose.
Hence, if exp 2 has type then exp 1 ; exp 2 has type as well.
The F# compiler issues a warning if exp 1 is of type different from unit as the result of
the evaluation might be of some use. This warning is avoided by using the ignore function:
Note that the second assignment uses the new value stored in the location denoted by x.
The operator ; may be omitted if the expressions are written on separate lines, that is,
exp 1
exp 2
means (exp 1 ) ; exp 2 .
180 Imperative features
incr r1;;
val it : int = 6
incr r1;;
val it : int = 7
We may even declare a function returning a closure with an internal counter:
let makeCounter() =
let counter = { count = 0 }
fun () -> incr counter;;
val makeCounter : unit -> (unit -> int)
8.5 Mutable record fields 181
clock();;
val it : int = 1
clock();;
val it : int = 2
Equality of records with mutable fields is defined as for records without such fields. Consider
the declarations:
let x = { count = 0 };;
val x : intRec = {count = 0;}
let y = { count = 0 };;
val x : intRec = {count = 0;}
let z = y;;
val z : intRec = {count = 0;}
The values bound to x, y and z are considered equal:
x = y;;
val it : bool = true
y = z;;
val it : bool = true
An assignment to the count field in the record bound to y has interesting consequences:
y.count <- 1;;
val it : unit = ()
x = y;;
val it : bool = false
y = z;;
val it : bool = true
z;;
val it : intRec = {count = 1;}
The assignment to the count field of y has hence not only changed y but also z. Environ-
ment and store give the explanation: the declarations create the following environment and
store (prior to the assignment of y.count) where x=y and y=z:
Environment Store
x { count loc 3 } loc 3 : 0
y { count loc 4 } loc 4 : 0
z { count loc 4 }
The assignment changes the store but leaves the environment unchanged:
Environment Store
x { count loc 3 } loc 3 : 0
y { count loc 4 } loc 4 : 1
z { count loc 4 }
182 Imperative features
The crucial point is that the records bound to y and z share the location loc 4 . One says
that z is an alias of y. Sharing and aliasing can have unexpected and unpleasant effects in
imperative programming. These phenomena do not exist in pure functional programming
where a value is immutable not to be changed.
It should be remembered that a record is a value in F# assignment to a record is not
possible. If required, one may declare a location containing a record:
let mutable t = x;;
val mutable t : intRec = {count = 0;}
This assignment changes the contents of loc 5 to the value {count loc 4 }.
The above examples illustrate some of the (pleasant and unpleasant) features of records
with mutable fields. The real importance is, however, their key role in handling objects from
F#. An assignable member of an object appears in F# as a mutable record field that can be
assigned using the <- operator, for example:
open System.Globalization;;
open System.Threading;;
Thread.CurrentThread.CurrentCulture <- CultureInfo "en-US";;
8.6 References
The F# compiler does not accept use of locally declared mutables in locally declared func-
tions.1 The above clock example could hence not be made without using records.
The ref type provides a shorthand for a record type containing a single mutable field,
and the ref function provides a shorthand for a value of this type. They appear2 as defined
as follows:
type a ref = { mutable contents: a }
let ref v = { contents = v }
1 The restriction is related to the memory management where problems might arise if a function was allowed to
return a closure using a locally defined mutable.
2 The symbol ! cannot be used as a user-defined prefix operator.
8.7 While loops 183
The evaluation of wh() will evaluate the expression e repeatedly until b becomes false
and that is exactly what is done by the evaluation of the while loop (we assume that the
identifier wh does not occur in b or e). Thus, any while loop can be expressed by a recursive
function declaration.
It should be noted that the F# compiler generates essentially the same binary code for the
while-loop and the function wh (the recursive call wh() is compiled to a branch instruc-
tion). There is hence no performance advantage in using the loop instead of the recursive
declaration. See also Section 9.5, especially the examples on Page 211.
will then successively apply the function f to the elements v0 , v1 , . . . , vn1 of the list. The
result (of type unit) of the evaluation is of no interest and the interesting part of the evalua-
tion is the side-effect. The following is a (not very interesting) application of List.iter:
let mutable sum = 0;;
let f x = sum <- sum + x;;
List.iter f [1; 2; -3; 5];;
val it : unit = ()
sum;;
val it : int = 5
8.9 Imperative tree traversal 185
The function iteri includes the index k of the element vk in the computations. Let f
be a function of type
f: int -> a -> unit
f 0 v0 , f 1 v1 , . . . , f (n 1) vn1
The interesting part of the evaluation is again the side-effect. The following is another (not
very interesting) application of List.iteri:
in the variable t.
The functions iter and iteri on other collections like Seq, Set and Map work in a
similar way.
We refer to Exercise 9.14 for an analysis of the run time of the function List.iter and
Set.iter.
and similar for postIter. Applying, for example, preIter to the tree t4 in Section 6.3
gives:
preIter (fun x -> printf " %d" x) t4;;
5 0 -3 2 7
We may in a similar way define a function for imperative depth-first traversal of list trees as
described in Section 6.6:
let rec depthFirstIter f (Node(x,ts)) =
f x ; List.iter (depthFirstIter f) ts;;
val depthFirstIter : (a -> unit) -> ListTree<a> -> unit
8.10 Arrays
The addresses in the physical memory of the computer are integers. Consider a sequence of
n equally sized contiguous memory locations loc 0 , loc 1 , . . . , loc n1 as shown in Figure 8.1.
The physical address physAdr k of the k th location loc k can in this situation be computed
by the formula:
physAdr k = physAdr 0 + k s
where s denotes the size of one location. The machine code computation of physAdr k
requires hence only two arithmetic operations.
This addressing scheme is used to implement arrays. An array of length n consists of n
locations loc 0 , loc 1 , . . . , loc n1 of the same type. The numbers 0, 1, . . . , n 1 are called
the indices of the elements. The type of the array is written
[]
where is the type of the elements.
8.10 Arrays 187
Arrays have the advantage over lists that any array location can be accessed and modified
in a constant (short) time, that is, in a small number of computations which is independent of
the size of the array. On the other hand, an array is a mutable object the old value is lost
when a location is modified. Furthermore, an array cannot be extended by more elements in
a simple way as the adjacent physical memory (after the last element in the array) might be
occupied for other use. A selection of operations on arrays is shown in Table 8.1.
An array can be entered using the [|. . . |] notation, for example:
let a = [|4;5;6;7|];;
val a : int [] = [|4; 5; 6; 7|]
Example: Histogram
Arrays are very convenient when counting frequencies and making a histogram. The fol-
lowing small program reeds a text file given by its directory path and count the frequency
of each character A to Z (not distinguishing small and capital letters) and prints the
resulting histogram. The reader may consult Section 10.3 about the used text I/O functions
and Section 10.7 about printf formats.
open System;;
open System.IO;;
Calling the function histogram on the path of the source file histogram.fsx will, for
example, give the output:
A: 20
B: 0
C: 22
...
X: 1
Y: 3
Z: 1
in a destructive update without retaining the old value. They should only be used in al-
gorithms using and maintaining a single current collection without ever referring to old
values.
The HashSet<a> and Dictionary<a,b> are implemented using hash-key tech-
nique: The basic data structure is an array (say of length N) where an element a (entry (a, b))
is stored in the array location with index:
where hash is the hash function of the equality type a. This is a rather efficient scheme,
but it runs into problems when multiple elements (entries) have the same index and hence
should be stored in the same array location. This collision problem is solved by storing the
colliding elements (entries) in a linked structure that can be accessed via the index value.
The HashSet<a> and Dictionary<a,b> collections have the following charac-
teristics:
A selections of operations on the imperative set and map classes are shown in Tables 8.2 and
8.3.
Indexing in a Directory by a key can be used to update a value by assignment:
and this construction may also be used to add a new entry to the directory.
Enumerator functions
The System.Collections.Generic library contains imperative features for element-
wise traversal of any of the collections including the F# collections list, set, map, etc.
The enumerator function of the book (to be declared on Page 193) makes these features
available in a functional setting. Applying enumerator to a collection yields a function:
enumerator(collection ): unit -> elementType option
where elementType is determined as follows:
collection elementType
NonMapOrDictionaryCollection<a> a
MapOrDictionaryCollection<a,b> KeyValuePair<a,b>
An element entry of type KeyValuePair<a,b> corresponds to an entry in the map or
dictionary, and it has components:
entry.Key of type a
entry.Value of type b
Applying enumerator to a set creates an imperative enumerator function where succes-
sive calls yield the elements in the set:
let f = enumerator (Set.ofList [3 ; 1; 5]);;
val f : (unit -> int option)
8.12 Functions on collections. Enumerator functions 191
f();;
val it : int option = Some 1
f();;
val it : int option = Some 3
f();;
val it : int option = Some 5
f();;
val it : int option = None
We refer to Exercise 9.13 for an analysis of the run time of this function and the version
declared on Page 109 .
type IEnumerator<c> =
abstract Current : c
abstract MoveNext : unit -> bool;;
for some type c. An enumerator object enum points to the element enum.Current and
it is forwarded to the next element by evaluating enum.MoveNext(). An initial call of
MoveNext is required to get a fresh enumerator to point to the first element, and the value
of MoveNext becomes false when the enumerator gets beyond the last element in the
collection.
Each collection has its own GetEnumerator member to create an enumerator object.
This object gets the following type:
The implementation is made such that the GetEnumerator member for any specific col-
lection can be considered an instance of a polymorphic GetEnumerator member working
on any collection. This polymorphism has been obtained by letting each collection imple-
ment the interface:
type IEnumerable<c> =
abstract GetEnumerator : unit -> IEnumerator<c>
The enumerator function refers to the collection using the IEnumerable type and
may hence be applied to any collection. It creates a reference e to an enumerator object and
this reference is then used inside a local function f that is returned as the result. A reference
is required because of the restriction on the use of mutable in closures:
open System.Collections.Generic;;
The queue is implemented using an array where the front queue element is stored in an
array location with index frontIndex while the rest of the queue is stored in the succeeding
locations with a wraparound to the beginning of the array if the queue extends beyond the
end of the array.
frontIndex
The Dequeue operation returns the array element with index frontIndex and advances front-
Index to the next array position (with a possible wraparound) while the Enqueue operation
stores the enqueued value in the first free array location. An Enqueue operation with a
filled array causes an array replacement where a new, larger array is reserved and all queue
elements are moved to the new array upon which the old array is abandoned.
A queue can be used to make the following elegant and interesting implementation of the
breadth-first traversal of list trees in Section 6.6.
The idea is to let the queue remains contain those list trees where the nodes remain to be
visited, initially the tree ltr. A list tree Node (x,tl) is dequeued, the elements of the
list tl of sub-trees are enqueued one-by-one using List.iter and the root node x is
visited. This procedure is repeated until the queue becomes empty.
Summary 195
f(1);;
it : unit = ()
f("ab");;
it : unit = ()
a;;
it : ? list = ["ab"; 1] *** Oops! type error !
The point is that F# would be forced to infer a type of f prior to any use of the function.
This would result in the type a -> unit because apparently values of any type can be
consed onto the empty list. Hence each of the applications f(1) and f("ab") would
type check because int as well as string are instances of the polymorphic type a. The
type check would hence fail to discover the illegal expression "ab"::[1] emerging during
the evaluation of f("ab").
The declaration
Summary
The chapter provides a semantical framework, the store, for understanding the imperative
features of F# that operate on and modify the state of mutable objects. A store consists of
a set of locations containing values, together with operators to access and change the store.
The main imperative constructs of F# is introduced together with extracts of .NET libraries
for imperative collections, including arrays, sets and maps. We explain why the restriction
on polymorphically typed expressions is needed because of the imperative features.
196 Imperative features
Exercises
8.1 Make a drawing of the environment and store obtained by the following declarations and as-
signments:
let mutable x = 1;;
let mutable y = (x,2);;
let z = y;;
x <- 7;;
8.2 The sequence of declarations:
let mutable a = []
let f x = a <- (x :: a)
f(1);;
are accepted by F#. Explain why.
8.3 Make a drawing of the environment and store obtained by the following declarations and as-
signments:
type t1 = { mutable a : int };;
type t2 = { mutable b : int ; c : t1 };;
let x = { a = 1 };;
let y = { b = x.a ; c = x };;
x.a <- 3;;
8.4 Declare null to denote the default value of the record type:
type t = { mutable link : t ; data : int };;
Declare some other values of type t and use assignment to build chains and circles of values of
type t. Declare a function to insert an element in the front of a chain of values of type t.
8.5 Give a declaration of the gcd function using a while loop instead of recursion (cf. Sec-
tion 1.8).
8.6 Declare a function for computing Fibonacci numbers Fn (see Exercise 1.5) using a while
loop. Hint: introduce variables to contain the two previously computed Fibonacci numbers.
8.7 Use a HashSet traversal for loop to declare a function
HashSetFold: (b -> a -> b) -> b -> HashSet<a> -> b
such that
f b set = f (. . . (f (f b a0 ) a1 ) . . .) an1
where a0 , . . . , an1 are the elements of the HashSet set.
8.8 Declare a DictionaryFold function. The type should correspond to the type of Map.fold.
8.9 Make declarations of breadthFirst and breadthFirstFold for list trees using an im-
perative queue.
Hint: unfold the while-loop in the declaration of breadthFirstIter to a local recursive
function and use argument and value of this function to build the result.
9
Efficiency
The efficiency of a program is measured in terms of its memory requirements and its running
time. In this chapter we shall introduce the concepts stack and heap because a basic under-
standing of these concepts is necessary in order to understand the memory management of
the system, including the garbage collection.
Furthermore, we shall study techniques that in many cases can be used to improve the
efficiency of a given function, where the idea is to search for a more general function, whose
declaration has a certain form called iterative or tail recursive. Two techniques for deriving
tail-recursive functions will be presented: One is based on using accumulating parameters
and the other is based on the concept of a continuation, that represents the rest of the com-
putation. The continuation-based technique is generally applicable. The technique using ac-
cumulating parameters applies in certain cases only, but when applicable it usually gives the
best results. We give examples showing the usefulness of these programming techniques.
We relate the notion of iterative function to while loops and provide examples showing
that tail-recursive programs are in fact running faster than the corresponding programs using
while loops.
The techniques for deriving tail-recursive functions are useful programming techniques
that often can be used to obtain performance gains. The techniques do not replace a con-
scious choice of good algorithms and data structures. For a systematic study of efficient
algorithms, we refer to textbooks on Algorithms and Data Structures.
Use of computer memory: The maximum size of computer memory needed to represent
expressions and bindings during the evaluation.
Computation time: The number of individual computation steps.
The important issue is to estimate how these figures depend on the size of the argument
for large arguments, for example, number of digits of integer argument, length of list
argument, depth (i.e. number of levels) of tree argument, etc. These performance figures
are essentially language independent, so implementations of the same algorithm in another
programming language will show a similar behaviour.
197
198 Efficiency
let ys = 3::4::xs;;
val ys : int list = [3; 4; 5; 6; 7]
let zs = xs @ ys;;
val zs : int list = [5; 6; 7; 3; 4; 5; 6; 7]
let n = 27;;
val n : int = 27
The stack and the heap corresponding to these declarations are shown in Figure 9.1.
stack heap
5 6 7
xs
3 4
ys
stack frame
zs
n 27 5 6 7
The stack contains an entry for each binding. The entry for the integer n contains the inte-
ger value 27, while the entries for the lists xs, ys and zs contain links (i.e. memory point-
ers) pointing at the implementations of these lists. A list [x0 ; . . . ; xn1 ] is implemented by
a linked data structure, where each list element xi is implemented by a cons cell containing
the value xi and a link to the cons cell implementing the next element in the list:
9.2 Memory management 199
The entry for xs in the stack contains a link to the cons cell for its first element 5 in the
heap.
The entry for ys in the stack contains a link to the cons cell for its first element 3. This
cons cell contains a link to the cons cell for the next element 4 and that cons cell contains
in turn a link to the first cons cell of xs.
The entry for zs in the stack contains a link to the first cons cell of a copy of the linked
list for xs (the first argument of @ in xs @ ys). The last cons cell of that copied linked
list contains a link to the start of the linked list for ys.
1. The linked lists for ys is not copied when building a linked list for y::ys.
2. Fresh cons cells are made for the elements of xs when building a linked list for xs @ ys,
as the last cons cell in the new linked list for xs must refer to the first cons cell of the
linked list for ys. The running time of @ is, therefore, linear in the length of its first
argument. This running time is in agreement with the declaration of append in Section 4.4
and with the linked-list based implementation used by the built-in append function.
The evaluation of the outermost declaration will start with an empty heap and a stack
frame sf 0 containing a (so far undefined) entry for zs:
stack heap
sf 0 zs ?
200 Efficiency
stack heap
xs 1 2
sf 1 ys
3 4
result
sf 0 zs ? 1 2
Notice that a copy of the list xs is made in the heap during the evaluation of xs @ ys.
stack heap
1 2
3 4
sf 0 zs 1 2
The resulting heap after the evaluation of the declaration for zs contains two cons cells
marked with . These cells are obsolete because they cannot be reached from any binding,
and they are therefore later removed from the heap by the garbage collector that manages
the heap behind the scene.
The management of the stack follows the evaluation of declarations and function calls in a
simple manner, and the used part of the stack is always a consecutive sequence of the relevant
stack frames. We illustrate this by a simple example. Consider the following declarations:
9.2 Memory management 201
let rec f n =
match n with
| 0 -> 0
| n -> f(n-1) + n;;
let x = f 3;;
The first part of the evaluation of f 3 makes repeated bindings of n corresponding to the
recursive function calls:
f3
; (f n, [n 3])
; (f(n-1) + n, [n 3])
; f 2 + (n, [n 3])
; (f n, [n 2]) + (n, [n 3])
; (f n, [n 0]) + (n, [n 1]) + (n, [n 2]) + (n, [n 3])
These bindings are implemented by four stack frames sf 1 , . . . , sf 4 pushed on top of the
initial stack frame sf 0 corresponding to f and x. Each of the stack frames sf 1 , . . . , sf 4
corresponds to an uncompleted evaluation of a function call:
stack heap
n 0
sf 4
result ?
n 1
sf 3
result ?
n 2
sf 2
result ?
n 3
sf 1
result ?
x ?
sf 0
f closure for f
The next evaluation step marks the completion of the innermost functions call f 0
and the binding n 0 is hence no longer needed. The implementation releases the memory
used to implement this binding by popping the frame sf 4 off the stack:
202 Efficiency
stack heap
n 1
sf 3
result ?
n 2
sf 2
result ?
n 3
sf 1
result ?
x ?
sf 0
f closure for f
When the evaluation terminates the stack frames sf 3 , sf 2 and sf 1 are all popped and the
initial stack frame sf 0 contains a binding of x to 6.
The stack management using the push and pop operations is very simple because the stack
is maintained as a contiguous sequence of the relevant stack frames. The stack memory will
hence never be fragmented.
let xs = [1;2];;
g 2;;
val it : int list = [1; 1; 2; 2]
Application of this function will produce garbage due to the local declaration of a list and
due to the use of List.rev. The stack and the heap upon the termination of g 2 is shown
in Figure 9.2. The stack contains just one stack frame corresponding to the top-level decla-
rations.
The heap contains five cons cells marked with , that are obsolete because they cannot
be reached from any binding, and they are removed from the heap by the garbage collector. It
is left for Exercise 9.1 to produced this stack and heap for an evaluation of g 2. The amount
of garbage produced using g grows with the size of the argument, and it is easy to measure
how much garbage the system has to collect.
9.2 Memory management 203
stack heap
2 2 1 1
xs 1 2
sf 0 g closure for g
it 1 1 2 2
#time;;
g 10000;;
Real: 00:00:01.315, CPU: 00:00:01.326,
GC gen0: 356, gen1: 24, gen2: 0
val it : int list = [9999; 9997; 9995; 9993; 9991; 9989; 9987;...]
The measurement includes two times: The Real time is the clock time elapsed during the
execution of the operation, in this case 1.315 second. The CPU time is the total time spent
by the operation on all CPUs (or cores) on your computer. If you are not exploiting the
parallelism of multiple cores, then these two times should approximately be the same.
The garbage collector manages the heap as partitioned into three groups or generations:
gen0, gen1 and gen2, according to their age. The objects in gen0 are the youngest while
the objects in gen2 are the oldest. The typical situation is that objects die young, that is,
garbage typically occurs among young objects, and the garbage collector is designed for
that situation. During the above evaluation of g 10000, the garbage collector reclaimed
(collected) 356 objects among the youngest ones from group gen0 and 24 objects from
gen1.
204 Efficiency
bigList 120000;;
val it : int list = [1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;...]
bigList 130000;;
Process is terminated due to StackOverflowException.
A call bigList n will generate n consecutive stack frames each with a binding of n and
the examples show that 120000 such stack frames are manageable while 130000 are not.
Another declaration of a function that can generate the same lists as the above one is given
below. This function can generate lists that are about 100 times longer than those generated
above, and when memory problems arise it is because the heap is exhausted:
In the next sections we study techniques that can be used to minimize the memory usage.
We have seen that the evaluation of the expression fact (N ) proceeds through a number
of evaluation steps building an expression with a size proportional to the argument N upon
which the expression is evaluated:
fact(N )
; (n * fact(n-1) , [n N ])
; N fact(N 1)
; N (n * fact(n-1) , [n N 1])
; N ((N 1) fact(N 2))
..
.
; N ((N 1) ((N 2) ( (4 (3 (2 1))) )))
; N ((N 1) ((N 2) ( (4 (3 2)) )))
..
.
; N!
The maximal size of the memory needed during this evaluation is proportional to N , be-
cause the F# system must remember (in the heap) all N factors of the expression: N *((N
1)*((N 2)*( (4*(3*(2*1))) ))) during the evaluation. Furthermore, during the
evaluation the stack will grow until it has N + 1 stack frame corresponding to the nested
calls of fact.
naiveRev [x1 , x2 , . . . , xn ]
; naiveRev [x2 , . . . , xn ]@[x1 ]
; (naiveRev [x3 , . . . , xn ]@[x2 ])@[x1 ]
..
.
; (( (([ ]@[xn ])@[xn1 ])@ @[x2 ])@[x1 ])
206 Efficiency
There are n + 1 evaluation steps above and heap space of size proportional to n is re-
quired by the F# system to represent the last expression. These figures are to be expected for
reversing a list of size n.
However, the further evaluation
Hence, the evaluation of (( (([ ]@[xn ])@[xn1 ])@ @[x2 ])@[x1 ]) requires
n(n + 1)
1 + 2 + n =
2
steps, which is proportional to n2 .
Note, that n! = factA (n, 1) and rev [x1 , . . . , xn ] = revA ([x1 , . . . , xn ], [ ]). So
good implementations for the above functions will provide good implementations for the
factorial and the reverse functions also.
so the evaluation of the arguments can be viewed as repeated (or iterated) applications of
this function.
The use of factA gives a clear improvement to the use of fact. Consider the following
example measuring the time of 1000000 computations of 16! using these two function:
let xs16 = List.init 1000000 (fun i -> 16);;
val xs16 : int list = [16; 16; 16; 16; 16; 16; 16; 16; ...]
#time;;
The performance gain of using factA is actually much better than the factor 2 indicated by
the above examples becomes the run time of the for construct alone is about 12 ms:
for i in xs16 do let _ = () in ();;
Real: 00:00:00.012, CPU: 00:00:00.015,
GC gen0: 0, gen1: 0, gen2: 0
val it : unit = ()
The use of revA gives a dramatically improvement to the use of naiveRev. Consider
the following example measuring the time used for reversing the list of elements from 1 to
20000:
let xs20000 = [1 .. 20000];;
naiveRev xs20000;;
Real: 00:00:07.624, CPU: 00:00:07.597,
GC gen0: 825, gen1: 253, gen2: 0
val it : int list = [20000; 19999; 19998; 19997; 19996;...]
9.5 Iterative function declarations 209
revA(xs20000,[]);;
Real: 00:00:00.001, CPU: 00:00:00.000,
GC gen0: 0, gen1: 0, gen2: 0
val it : int list = [20000; 19999; 19998; 19997; 19996; ...]
The naive version takes 7.624 seconds while the iterative version takes just 1 ms. One way
to consider the transition from the naive version to the iterative version is that the use of
append (@) has been reduced to a use of cons (::) and this has a dramatic effect of the
garbage collection. No object is reclaimed by the garbage collector when revA is used,
whereas 825+253 obsolete objects were reclaimed using the naive version and this extra
memory management takes time.
Returning to the list-generating functions on Page 204, the function bigListA is a more
general function than bigList, where the argument xs is the accumulating parameter.
let h(xs,ys) = ys
When a declaration of a function in an obvious way can be transformed into the above
form, we will call it an iterative function without further argument.
which is an instance of the above schema. The above function revA is actually an applica-
tion of this iterative function:
let revA(xs,ys) = fold (fun e x -> x::e) ys xs;;
Thus,
f 0 x = x, f 1 x = f x, . . . , f n x = f (f ( f x ))
n
Suppose that
p(f i x) ; true for all i : 0 i < n, and
p(f n x) ; false
Then, the evaluation of the expression g x proceeds as follows:
gx
; (if p z then g (f z) else h z , [z x])
; (g(f z), [z x])
; g(f 1 x)
; (if p z then g (f z) else h z , [z f 1 x])
; (g(f z), [z f 1 x])
; g(f 2 x)
; ...
; (if p z then g (f z) else h z , [z f n x])
; (h z, [z f n x])
; h(f n x)
This evaluation has three desirable properties:
1. It does not build large expressions, as the argument f z of g(f z) is evaluated at each step
due to the eager evaluation strategy of F#,
2. there are n recursive calls of g , and
3. there is only one environment used at each stage of this evaluation.
The first property implies that heap allocation of long expressions with pending operations
can be avoided, the second property implies a linear unfolding of the recursive function g ,
and the last property implies that just one stack frame is needed during an evaluation of g x
(ignoring stack frames needed due to calls of other functions).
Since bigListA is a tail-recursive function, the stack will not grow during the evaluation
of bigListA nxs and the heap is hence the limiting memory resource when using this
function as we learned in connection with the examples on Page 204.
Iterations as loops
We observed in Section 8.7 that every while loop can be expressed as an iteration. It is also
the case that every iterative function g :
let rec g z = if p z then g(f z) else h z;;
can be expressed as a while loop:
let rec g z =
let zi = ref z
while p !zi do zi := f !zi
h(!zi);;
212 Efficiency
Using this translation scheme for the iterative version factA of the factorial function,
we arrive at the declaration:
let factW n =
let ni = ref n
let r = ref 1
while !ni>0 do
r := !r * !ni ; ni := !ni-1
!r;;
where it is taken into account that the argument z in the translation scheme in this case is a
pair (n,r).
There is no efficiency gain in transforming an iteration to a while-loop. Consider for example
1000000 computations of 16! using factA(16,1) and factW 16:
#time;;
have computed f (vk ). It is therefore called a continuation. The evaluation of fC starts with
c0 = id where id is the pre-defined identity function satisfying id a = a for any a. The
effects of the recursive calls of f are gradually accumulated in the continuations ck during
the evaluation of fC v id, and the evaluation ends by applying the continuation cn to the
value f (vn ) in a base case.
The notion of a continuation has a much wider scope than achieving tail-recursive func-
tions (the focus in this chapter) and we refer to [12] for an in-depth study of this concept.
Consider, for example, the simple declaration of bigList from Section 9.2:
let rec bigList n = if n=0 then [] else 1::bigList(n-1);;
val bigList : int -> int list
that was used to illustrate the stack limit problems due to the fact that it is not a tail-recursive
function. The continuation-based version bigListC n c has a extra argument
c: int list -> int list
The base case of bigListC is obtained from the the base case of bigList by feeding
that result into the continuation c. For the recursive case, let res denote the value of the
recursive call of bigList(n-1). The rest of the computation of bigList n is then
1::res. Hence, the continuation of bigListC(n-1) is
fun res -> c(1::res)
1. The version using an accumulating parameter is much faster (about five times) than that
using continuations.
2. The version using continuations can handle about 30% longer lists.
A counting function: countA: int -> BinTree<a> -> int using an accumulat-
ing parameter will not be tail-recursive due to the expression containing recursive calls on
the left as well as the right sub-trees of a node (try, for example, Exercise 9.8). A tail-
recursive version can, however, be developed for a continuation-based version:
The base case countC Leaf c returns c 0. The continuation of countC tl in the case:
countC (Node(n,tl,tr)) c is the function that takes the result vl for the left subtree
and calls countC tr. The continuation of countC tr must take the result vr for the right
subtree and feed vl+vr+1 into the continuation c:
9.6 Tail recursion obtained using continuations 215
count t20000000;;
Real: 00:00:00.453, CPU: 00:00:00.889,
GC gen0: 0, gen1: 0, gen2: 0
val it : int = 20000000
Summary
We have introduced the concepts stack and heap that are needed in order to get a basic
understanding of the memory management in the system.
Furthermore, we have introduced the concept of tail-recursive functions and two tech-
niques for deriving a tail-recursive version of a given function, where one is based on ac-
cumulating parameters and the other on the notion of a continuation. The stack will not
grow during the evaluation of tail-recursive functions (ignoring the calls of other recursive
functions), and using these techniques will in many typical cases give good performance
gains.
A transformation from tail-recursive functions to loops was shown, together with experi-
ments showing that the tail-recursive functions run faster than the corresponding imperative
while-loop based versions.
Exercises
9.1 Consider the function g declared on Page 202 and the stack and heap after the evaluation of g 2
shown in Figure 9.2. Reproduce this resulting stack and heap by a systematic application of push
and pop operations on the stack, and heap allocations that follow the step by step evaluation of
g 2.
9.2 Show that the gcd function on Page 16 is iterative.
9.3 Declare an iterative solution to exercise 1.6.
9.4 Give iterative declarations of the list function List.length.
9.5 Express the function List.fold in terms of an iterative function itfold iterating a function
of type a list * b -> a list * b.
9.6 Declare a continuation-based version of the factorial function and compare the run time with
the results in Section 9.4.
9.7 Develop the following three versions of functions computing Fibonacci numbers Fn (see Exer-
cise 1.5):
1. A version fibA: int -> int -> int -> int with two accumulating parameters n1 and
n2 , where fibA n n1 n2 = Fn , when n1 = Fn1 and n2 = Fn2 . Hint: consider suitable
definitions of F1 and F2 .
2. A continuation-based version fibC: int -> (int -> int) -> int that is based on the
definition of Fn given in Exercise 1.5.
Compare these two functions using the directive #time, and compare this with the while-loop
based solution of Exercise 8.6.
9.8 Develop a version of the counting function for binary trees
that makes use of an accumulating parameter. Observe that this function is not tail recursive.
9.9 Declare a tail-recursive functions with the type
such that count t = countAC t 0 id. The intuition with countAC t a c is that a is the
number of nodes being counted so far and c is the continuation.
Exercises 217
Processing text files containing structured data is a common problem in programming you
may just think of analysing any kind of textual data generated by electronic equipment or
retrieved data from the web.
In this chapter we show how such programs can be made in a systematic and elegant way
using F# and the .NET library. Data are extracted from text files using functions from the
RegularExpressions library. The data processing of the extracted data is done with
a systematic use of F# collections types list<a>, Map<a,b> and Set<a>. Easy
access from F# programs to the extensive text processing features of the .NET library is
given in a special TextProcessing library that can be copied from the home page of the
book. The chapter centers on a real-world example illustrating the techniques.
Time performance of programs is always a problem, even with todays very fast computers.
Poor performance of text processing programs is often caused by operations on very long
strings. The method in this chapter uses three strategies to avoid using very long strings:
1. Text input is in most cases read and processed in small pieces (one or a few lines).
2. Text is generated and written in small pieces.
3. Large amounts of internal program data are stored in many small pieces in F# collections
like list, set or map.
The main focus is on methods for handling textual data both as input and output, but
we also illustrate other topics: how to save binary data on the disk to be restored later by
another program, and how to read and analyse source files of web-pages. The techniques are
illustrated using an example: the generation of a web-page containing a keyword index of
the F# and .NET library documentation.
219
220 Text processing programs
...
"Control.Observable Module (F#)" observer eventobserver
"Control.WebExtensions Module (F#)" asyncweboperation
"Microsoft.FSharp.Core Namespace (F#)"
"Core.ExtraTopLevelOperators Module (F#)" topleveloperators
"Core.LanguagePrimitives Module (F#)" languageprimitives
"Core.NumericLiterals Module (F#)" numericliteral
...
The source data to generate the keyword index are found in the keyword.txt file that
is edited manually by the programmer generating the index (cf. Table 10.1). Each line in this
file contains the title of a library documentation web-page together with the keywords that
should refer to this particular web-page. Space characters inside keywords are written using
a tilde character such that spaces can be used to separate keywords. The line:
(the second containing a space character) with links to the library documentation web-page
with title:
The programs generating the keyword index from these (and other) data are described in
Section 10.8.
10.2 Capturing data using regular expressions 221
of type string * (string list) containing the title and the list of keywords.
This section presents a systematic technique of constructing functions performing such
captures. Using the technique involves three steps
The difficult part is describing the syntactical structure in terms of regular expressions.
2 4
? ?
1- 3 - 5 -6-
"Control.Observable Module (F#)" observer eventobserver
ii - ii -
6 6
i i
We want to capture the parts labelled (3) and (ii) in the figure.
This syntactical structure of an input line can be described by stating that the line should
consists of the following parts:
Construct Legend
char Matched by the character char. Character char must be
different from . $ { [ ( | ) ] } * + ?
\specialChar Matched by specialChar in above list (e.g. $ matches \$)
\ddd Matched by character with octal value ddd
\S Matched by any non-blank character
\s Matched by any blank character
\w Matched by any letter or digit
\d Matched by any decimal digit
[charSet] Matched by any character in charSet
[charSet] Matched by any character not in charSet
regExpr1 regExpr2 Matched by the concatenation of a string matching
regExpr1 and a string matching regExpr2
regExpr * Matched by the concatenation of zero or more strings
each matching regExpr
regExpr + Matched by the concatenation of one or more strings
each matching regExpr
regExpr ? Matched by the empty string or a string matching regExpr
regExpr1 | regExpr2 Matched by a string matching regExpr1 or regExpr2
(?: regExpr) Weird notation for usual bracketing of an expression
( regExpr ) Capturing group
\G The matching must start at the beginning of the string or
the specified sub-string (\G is not matched to any character)
$ The matching must terminate at end of string
($ is not matched to any character)
charSet = Sequence of chars, char matches and char ranges: char1 -char2
The documentation of the System.Text.RegularExpressions library
contains a link to a regular expression manual.
The F# Power Pack uses another syntax for regular expressions.
Regular expressions
Regular expressions formalize the above informal ideas. A regular expression works as a
pattern for strings. Some strings will match a regular expression, others will not. We will
pay special attention to two kinds of elements in the above informal description:
1. Classes of characters like a quote character, a non-blank character.
2. Constructs like sequence of, one or more, zero or more.
They are formalized in the regular expression notation as:
1. Single character expressions matched by single characters.
2. Operators for building composite expressions.
Selected parts of the regular expression notation in the .NET library is described in
Table 10.2. The upper part of this table contains single character expressions:
The regular expression \S is matched, for example, by the character P,
the regular expression \d is matched, for example by the character 5, and
the regular expression \042 is just matched by the character ".
10.2 Capturing data using regular expressions 223
The single character expressions [. . . ] and [. . . ] are matched by any single character in
a set of characters:
The expression [ab ] is matched by any single character among a, b or space, and
the expression [cd] is matched by any single character except c and d.
Brackets are used in any algebraic notation whenever an operator is applied to a composite
expression like in the expression (a + b)2 . In regular expressions we need a further kind of
brackets to mark the parts corresponding to data to be captured. There are hence two kinds
of brackets in regular expressions:
The designers of the notation have for some mysterious reason decided to use the normal
parentheses (. . . ) as capturing brackets while the strange notation (?:. . . ) is used to de-
note usual brackets. You just have to accept that the weird symbol (?: is the way of writing
a usual left bracket in this notation.
Using the notation in Table 10.2 we get the wanted formalization of our description of the
syntactical structure of lines in the keyword file in form of the regular expression:
\G\s*\042([\042]+)\042(?:\s+([\s]+))*\s*$
The details in this regular expression can be explained using a picture similar to the previous
picture explaining the structure of the string:
a 1 2 4 6 a
?? ? ? ??
3 - 5 -
\G\s * \042([\042]+)\042(?:\s+([\s]+)) * \s * $
ii -
6
i
The first and last symbols \G and $ labelled (a) are anchors used to restrain the matching to
all of a string. They are not matched to any characters. The other parts work as follows:
Expression Matched by
1 \s* Zero or more blank characters
2 \042 A quote character
3 ([\042]+) Capturing group of one or more non-quote characters
4 \042 A quote character
5 (?:. . . )* Zero or more occurrences of:
i \s+ One or more blank characters
ii ([\s]+) Capturing group of one or more non-blank chars
6 \s* Zero or more blank characters
224 Text processing programs
? ? ? ? ?
\s*\042([\042]+)\042(?:\s*([\s]+))*\s*$
The capturing groups in the regular expression are numbered 1, 2, . . . according to the
order of the opening brackets. In our case we have two capturing groups ([\042]+) and
([\s]+). The first is not in the scope of any operator and will hence capture exactly once
while the second is in the scope of a * operator and may hence capture zero or more times.
The picture shows that the second capturing group will capture twice in this case.
The functions captureSimple and captureList in the TextProcessing li-
brary of the book (cf. Table 10.4) give a convenient way of extracting the captured data from
the Match object. The matching and data capture will then proceed as follows:
10.2 Capturing data using regular expressions 225
Table 10.4 Functions from the TextProcessing library of the book. See also Appendix B
open TextProcessing;;
let m = reg.Match
"\"Control.Observable Module (F#)\" observer eventobserver";;
m.Success;;
val it : bool = true
captureSingle m 1;;
val it : string = "Control.Observable Module (F#)"
captureList m 2;;
val it : string list = ["observer"; "eventobserver"]
tildeReplace "eventobserver";;
val it : string = "event observer"
226 Text processing programs
Nested data
The Match object and function are less elegant in case of nested data like:
John 35 2 Sophie 27 Richard 17 89 3
where we want to capture the data in a form using nested lists:
[("John", [35; 2]); ("Sophie", [27]);
("Richard", [17; 89; 3])]
The nested syntactic structure is faithfully described in the regular expression
let regNest =
Regex @"\G(\s*([a-zA-Z]+)(?:\s+(\d+))*)*\s*$";;
with anchors \G and $ enclosing
(. . . )* Zero or more occurences of capturing group of
\s* Zero or more spaces
([a-zA-z]+) Capturing group of one or more letters
(?:. . . )* Zero or more occurences of:
\s+ One or more spaces
(\d+) Capturing groups of one or more digits
\s* Zero or more spaces
The data groups captured by Match:
group 1: " John 35 2" , " Sophie 27" , " Richard 17 89 3"
group 2: "John" , "Sophie" , "Richard"
group 3: "35" , "2" , "27" , "17" , "89" , "3"
can, however, not be used directly to get the above nested list structure the data captured
by group 3 do not reflect the nesting.
A systematic method to capture such data using grammars and parsers is presented in
Section 12.10. At present we show two ad hoc ideas to capture the nested data:
Capture in two steps.
Using successive calls of Match.
let m = regOuter.Match
" John 35 2 Sophie 27 Richard 17 89 3 ";;
captureList m 1;;
val it : string list =
[" John 35 2"; " Sophie 27"; " Richard 17 89 3"]
The inner data capture uses the regular expression:
let regPerson1 =
Regex @"\G\s*([a-zA-Z]+)(?:\s+(\d+))*\s*$";;
It captures the person name as a letter string and each integer value as a digit string. The
digit strings need further conversions to the corresponding int values. This is done using
the List.map function to apply the conversion function int to each digit string:
let extractPersonData subStr =
let m = regPerson1.Match subStr
(captureSingle m 1, List.map int (captureList m 2));;
val extractPersonData : string -> string * int list
Combining these ideas we get the following function:
let getData1 str =
let m = regOuter.Match str
match (m.Success) with
| false -> None
| _ ->
Some (List.map extractPersonData (captureList m 1));;
val getData1 : string -> (string * int list) list option
Each of the person data sub-strings will then match this regular expression when matching
from the start position of the sub-string, for instance when matching from the position ( =11)
of the character S in Sophie:
let m =
regPerson2.Match
(" John 35 2 Sophie 27 Richard 17 89 3 ", 11);;
captureSingle m 1;;
val it : string = "Sophie"
captureList m 2;;
val it : string list = ["27"]
m.Length ;;
val it : int = 10
The length of the captured sub-string is given by m.Length and the new position:
newPosition = startPosition + m.Length
is the position of the first character R in the next person data sub-string.
These are combined in the function personDataList that tries to extract a list of
person data from the string str starting at position pos and terminating at position top:
let rec personDataList str pos top =
if pos >= top then Some []
else let m = regPerson2.Match(str,pos)
match m.Success with
| false -> None
| true -> let data = (captureSingle m 1,
List.map int (captureList m 2))
let newPos = pos + m.Length
match (personDataList str newPos top) with
| None -> None
| Some lst -> Some (data :: lst);;
val personDataList : string -> int -> int
-> (string * int list) list option
The function returns an empty list Some [] if pos top. Otherwise, a match with the
regular expression regPerson2 is tried. A negative result None is returned if the match
is unsuccessful. Otherwise the data are captured and the new position calculated. The result
now depends on the outcome of a recursive call using the new position: A negative result is
propagated otherwise a positive result is obtained by consing the captured person data
onto the list found in the recursive call.
When applying personDataList to a string we start at position 0 with top position
equal to the length of the string:
let getData2 (s: string) = personDataList s 0 s.Length;;
getData2 " John 35 2 Sophie 27 Richard 17 89 3 ";;
val it : (string * int list) list option =
Some [("John", [35; 2]); ("Sophie", [27]);
("Richard", [17; 89; 3])]
10.3 Text I/O 229
A StreamWriter has an internal data buffer. Part of a string sent to the writer may
be temporarily stored in the buffer for later writing to the output medium. A call of Flush
ensures that the buffer contents is written to the medium.
It is often possible to process an input text on a line-by-line basis. The program will then
input one or more lines, do some computations, input the next lines, etc. This pattern of
computation is captured in the functions of the TextProcessing library of the book
described in Table 10.6. Signature and implementation files are given in Appendix B.
A typical application of fileFold is to build a collection using a function f that captures
data from a single input line (as described in Section 10.2) and adds the data to the collection.
A typical application of fileIter is to generate a side effect for each input line such
as output of some data or updating of an imperative data structure. The applications of
fileXfold and fileXiter are similar, but involve several lines of the file.
Table 10.6 File functions of the TextProcessing library of the book. See also Appendix B
Table 10.7 Some file handling function from the System.IO library
Table 10.8 Save/restore functions from the books TextProcessing library. See also Appendix B
The following examples show how to save two values on the disk:
open TextProcessing;;
let v1 = Map.ofList [("a", [1..3]); ("b", [4..10])];;
val v1 : Map<string,int list> =
map [("a", [1; 2; 3]); ("b", [4; 5; 6; 7; 8; 9; 10])]
saveValue v1 "v1.bin";;
val it : unit = ()
saveValue v2 "v2.bin";;
val it : unit = ()
These values are restored as follows:
let value1:Map<string,int list> = restoreValue "v1.bin";;
val value1 : Map<string,int list> =
map [("a", [1; 2; 3]); ("b", [4; 5; 6; 7; 8; 9; 10])]
f 7;;
val it : int = 10
g 2;;
val it : int = 8
232 Text processing programs
Note that arbitrary values, including functions, can be saved on the disk and retrieved again
and that the type annotations are necessary when restoring the values, because the F# system
otherwise would have no information about the types of retrieved values. Furthermore, we
have omitted the warning concerning the incomplete pattern [f;g] in the last example.
instead of
let reader = File.OpenText path
The keyword use indicates to the system that the binding of reader comprises resources
that should be released once the program is no longer using the object bound to reader. The
system will in this case release these resources when the binding of the identifier reader
cannot be accessed any longer from the program. One usually places the use declaration
inside a function such that the object is automatically released on return from the function.
This mechanism is implemented in the library functions by letting all objects that own
resources implement the IDisposable interface. This interface contains a Dispose op-
eration that is called when the object is released. Declaring use-bindings can only be done
for such objects.
The complete collection of the (more than 350) supported cultures is found in the sequence
CultureInfo.GetCultures(CultureTypes.AllCultures)
10.6 Culture-dependent information. String orderings 233
and you can get a complete (and long) list of Name and DisplayName by calling the
printing function:
let printCultures () =
Seq.iter
(fun (a:CultureInfo) ->
printf "%-12s %s\n" a.Name a.DisplayName)
(CultureInfo.GetCultures(CultureTypes.AllCultures));;
System.Threading.Thread.CurrentThread.CurrentCulture
is used by default in culture-dependent formatting (cf. Section 10.7). The Name field:
System.Threading.Thread.CurrentThread.CurrentCulture.Name
to a culture name:
yields a function:
for example:
open System.Globalization;;
open TextProcessing;;
The comparison operators compare, <, <=, > and >= are customized on orderString
values to the string ordering determined by the culture. We may, for example, observe that
the alphabetic order of the national letters and a is different in Sweden and Denmark:
svString "" < svString "
a";;
val it : bool = false
The string function gives the string imbedded in an orderString value, while the
function orderCulture gives the culture:
let str = svString "abc";;
string str;;
val it : string = "abc"
orderCulture str;;
val it : string = "sv-SE"
This function uses List.map to apply enString to each string in a list of strings. The
resulting list of orderString values is then sorted using List.sort. Finally, the strings
are recovered by applying string to each element in the sorted list using List.map.
The "en-US" ordering has interesting properties: Alphabetic order of characters over-
rules upper/lower case. For example:
enListSort ["Ab" ; "ab" ; "AC" ; "ad" ] ;;
val it : string list = ["ab"; "Ab"; "AC"; "ad"]
10.7 Conversion to textual form. Date and time 235
The string ordering corresponds to the order of the entries in a dictionary. This is almost the
lexicographical order (ignoring case) but not quite, for example:
enListSort ["multicore";"multi-core";"multic";"multi-"];;
val it : string list
= ["multi-"; "multic"; "multicore"; "multi-core"]
The string multicore precedes multi-core because the minus character in this
context is considered a hyphen in a hyphenated word, while the string multi- precedes
multic because the minus character in this context is considered a minus sign, and this
character precedes the letter c.
Note the convenient use of a sorting function. One may also obtain textual output sorted
according to culture by using values of orderString type as keys in set or map col-
lections: the fold and iter functions will then traverse the elements of such a collection
using the described ordering. The same applies to the enumerator functions of SortedSet
and SortedDictionary collections (cf. Section 8.12).
The ordering of orderString values is defined using the String.Compare function:
String.Compare(string 1 ,string 2 , cultureInfo )
The user may consult the documentation of this function in [9] for further information about
the culture-dependent orderings.
The function sprintf delivers the formatted string as the result, while the other functions
writes this string on some output media:
printf writes on Console.Out.
fprintf writer writes on StreamWriter writer
eprintf writes on Console.Error.
A format placeholder has the general form :
%{flags}{width}{.precision}formatType
where {. . .} means that this part is optional. Frequently used format types and flags are
shown in Table 10.10 and Table 10.11.
The integers width and precision are used in formatting numbers, where width specifies
the total number of printing positions while precision specifies the number of decimals:
sprintf "%bhood" (1=2);;
val it : string = "falsehood"
sprintf "%-6d" 67;;
val it : string = "67 "
sprintf "%+8e" 653.27;;
val it : string = "+6.532700e+002"
sprintf "a%+7.2fb" 35.62849;;
val it : string = "a +35.63b"
open System;;
let localNow = DateTime.Now;; // local time
let UtcNow = DateTime.UtcNow;; // Utc time
A TimeZoneInfo object can be used to convert between local standard time and uni-
versal time, but the conversion does not cater for daylight saving time. Further information
can be found in
d Short date
D Long date
t Short time
T Long time
F Long date and long time
g Short date and short time
M or m Month and day
Y or y Year and month
Table 10.13 Selected Date-time Format Codes
Precision is an integer between 0 and 99 specifying the number of decimals. There are two
kinds of formats: numeric formats as shown in Table 10.12 and date-time formats as shown
in Table 10.13. (Further information can be found in the .NET documentation web-pages for
Numeric Format Strings and Date Time Format Strings.)
Some examples:
open System ;;
String.Format("{,7:F2}",35.2) ;;
val it : string = " 35,20"
let dk = CultureInfo "da-DK" ;;
let en = CultureInfo "en-US" ;;
let ru = CultureInfo "ru-RU" ;;
let now = DateTime.Now ;;
String.Format(dk, "{1:d}...{0:c}", 45, now) ;;
val it : string = "17-10-2011...kr 45,00"
String.Format(ru,"{0:d}",now) ;;
val it : string = "17.10.2011"
String.Format(en,"{0:d}",now) ;;
val it : string = "10/17/2011"
String.Format(en,"{0:F}",now) ;;
val it : string = "Monday, October 17, 2011 2:57:50 PM"
let ar = CultureInfo "es-AR" ;;
String.Format(ar,"{0:F}",now) ;;
val it : string = "lunes, 17 de octubre de 2011 02:57:50 p.m."
6 ?
The box webCat in the system diagram (see Figure 10.2) is a map from titles to uris:
webCat: Map<string,string>
Such a map is called a webCat map in the following, and could contain the entry with:
The file webCat.bin is a binary file for a webCat map and we shall see in Section 10.9
how this file to a large extend can be generated automatically.
The set keyWdSet in the system diagram (see Figure 10.2) has the type:
keyWdSet: Set<orderString*string>
An element in this set is called a webEntry and consists of a pair of keyword and associated
uri, where the keyword is encoded in an orderString value. The set could include the
following two elements:
("observer",
"http://msdn.microsoft.com/en-us/library/ee370313")
("event observer",
"http://msdn.microsoft.com/en-us/library/ee370313")
...
<a href="http://msdn.microsoft.com/en-us/library/ee353608">
null literal</a><br />
<a href="http://msdn.microsoft.com/en-us/library/ee353820">
numeric literal</a><br />
<br />
<a href="http://msdn.microsoft.com/en-us/library/ee370313">
observer</a><br />
<a href="http://msdn.microsoft.com/en-us/library/ee353721">
open</a><br />
<a href="http://msdn.microsoft.com/en-us/library/ee340450">
...
Data Capture
We now show how to capture the text in our keyword index file (see Section 10.1) to allow
comment lines in the keyword file. We will allow two types of comment lines, a blank line
and a line starting with two slash characters.
These two syntactic patterns are described by the regular expression comReg containing
the or operator |. Note that the sub-expression \G// for comment lines is without trailing
anchor $. A string will hence match this pattern just if the two first characters in the string
are slash characters.
let comReg = Regex @"(?:\G\s*$)|(?:\G//)";;
A normal line with keyword data matches the regular expression:
let reg =
Regex @"\G\s*\042([\042]+)\042(?:\s+([\s]+))*\s*$";;
The getData function should return a result of type:
type resType = | KeywData of string * string list
| Comment
| SyntError of string;;
and the replacement of tilde characters by spaces is made by the function tildeReplace:
let tildeReg = Regex @"";;
let tildeReplace str = tildeReg.Replace(str," ");;
These ideas and the techniques from Section 10.2 are combined in the function getData:
let getData str =
let m = reg.Match str
if m.Success
then KeywData(captureSingle m 1,
List.map tildeReplace (captureList m 2))
else let m = comReg.Match str
if m.Success then Comment
else SyntError str;;
val getData : string -> resType
10.8 Keyword index example: The IndexGen program 241
let keyWdIn() =
let webCat = restoreValue "webCat.bin"
let handleLine (keywSet: Set<orderString*string>) str =
match getData str with
| Comment -> keywSet
| SyntError str -> failwith ("SyntaxError: " + str)
| KeywData (title,keywL) ->
let uri = Map.find title webCat
let addKeywd kws kw = Set.add (enString kw, uri) kws
List.fold addKeywd keywSet keywL
let keyWdSet = Set.empty<orderString*string>
fileFold handleLine keyWdSet "keywords.txt";;
val keyWdIn : unit -> Set<orderString * string>
The idea is to build a webEntry set by folding a function handleLine over all the lines
of the keywords.txt file. The function handleLine translates the title in a line to the
corresponding uri using the webCat map (that has earlier been input from the webCat.bin
file). This uri is then paired with each keyword in the line and these pairs are inserted in the
webEntry set.
let webOut(keyWdSet) =
use webPage = File.CreateText "index.html"
let outAct oldChar (orderKwd: orderString,uri: string) =
let keyword = string orderKwd
let newChar = keyword.[0]
if Char.ToLower newChar <> Char.ToLower oldChar
&& Char.IsLetter newChar
then webPage.WriteLine "<br />"
else ()
webPage.Write "<a href=
"
webPage.Write uri
webPage.WriteLine ">"
webPage.Write (HttpUtility.HtmlEncode keyword)
webPage.WriteLine "</a><br />"
newChar
webPage.WriteLine preamble
Set.fold outAct a keyWdSet |> ignore
webPage.WriteLine postamble
webPage.Close()
first character. The argument of the function is, therefore, the first character of the previous
keyword and the value of the function is the first character of the just treated keyword.
The keyword is extracted from the orderString value. It becomes a displayed text
in the web-page and must hence be encoded in HTML encoding. This is done using the
HttpUtility function HtmlEncode from the System.Web library. The uri becomes
an attribute and should hence not be encoded.
http://msdn.microsoft.com/en-us/library/ee353567.aspx
http://msdn.microsoft.com/en-us/library/gg145045.aspx
10.9 Keyword index example: Analysis of a web-source 243
Scanning the HTML source of the web-page of the F# Core Library Reference we may for
instance find the following link to another documentation web-page:
<a data-tochassubtree="true"
href="/en-us/library/ee370255"
id="ee340636_VS.100_en-us"
title="System.Collections Namespace (F#)">
System.Collections Namespace (F#)</a>
This can be used to find the new title System.Collections Namespace (F#) with
associated path /en-us/library/ee370255. The path is relative to the current web-
page but it can be converted to an absolute uri.
The ignore setting of DtdProcessing is required for some security reasons. The XML
reader is a mutable data structure pointing at any time to an XML node in the HTML-source.
Successive calls of reader.Read() will step the reader through the nodes. Properties of the
current node are found as the current values of members of the reader object, where we will
use the following properties of the current node:
reader.NodeType: XmlNodeType
reader.Name: string
reader.Value: string
reader.GetAttribute: string -> string
reader.Depth: int
while the end of the sub-menu is indicated by a matching div end element </div> at the
same level (that is, with the same Depth).
A button is given by
that takes a uri as argument and reads through the corresponding web-page source and ex-
tracts the list of pairs (title,uri) corresponding to buttons in the above described sub-menu
of the navigation menu. The complete program is found in Appendix A.3. The reader may
pay special attention to the following:
type infoType =
| StartInfo of int | EndDiv of int
| RefInfo of string * string | EndOfFile;;
webCat0.txt
keywords.txt
NextLevelRefs
MakeWebCat
IndexGen
The text file webCat0.txt is shown in Table 10.17. It contains two pairs of lines with the
title and uri of the two root documentation pages. The text file keywords.txt contains
titles of documentation web-pages with associated keywords and an extract of the file is
shown in Table 10.1.
The keyword index is generates in three steps:
Using the file NextLevelRefs.fsx in an interactive environment one makes two calls
of the main function:
where the webCat files are placed in a directory defined by the interactive environment.
The keyword index is designed to contain references to documentation web-pages two lev-
els down in the trees, so the files webCat0.txt, webCat1.txt and webCat2.txt
contain all the information needed to build the webCat - but in textual form.
An extract of the file webCat1.txt is shown in Table 10.18. The files webCat1.txt
and webCat2.txt has the same structure as webCat0.txt containing pairs of lines
with title and associated uri of documentation web-pages:
10.10 Keyword index example: Putting it all together 247
...
Microsoft.FSharp.Collections Namespace (F#)
http://msdn.microsoft.com/en-us/library/ee353413
Microsoft.FSharp.Control Namespace (F#)
http://msdn.microsoft.com/en-us/library/ee340440
Microsoft.FSharp.Core Namespace (F#)
http://msdn.microsoft.com/en-us/library/ee353649
...
open System.IO;;
open TextProcessing;;
[<EntryPoint>]
let main (args: string[]) =
let webCat = Array.fold
addRefsInFile
Map.empty
args
saveValue webCat "webCat.bin"
0;;
Generating webCat.bin
This is done using the MakeWebCat program shown in Table 10.19. A free-standing ver-
sion of this program is called as follows in a command window:
The program inputs the title-uri pairs in each of the specified webCat input files and collects
the corresponding webCat entries in the webCat map. This map is then saved in the file
webCat.bin using the saveValue function in the books TextProcessing library.
Using the file MakeWebCat.fsx in an interactive environment one should make the
following call of the main function:
Summary
In this chapter we have studied a part of the text processing facilities available in F# and the
.NET library, including
regular expressions,
textual input and output from and to files,
save and retrieval of F# values on and from files,
culture-dependant ordering of strings,
retrieval of web-information, and
processing XML files.
These text-processing facilities are generally applicable, and we have illustrated their ex-
pressive power in the construction of a program that can generate a web-page containing a
keyword index of the F# and .NET library documentation.
Exercises 249
Exercises
10.1 The term word is used in this exercise to denote a string not containing blank characters. The
blank characters of a string do hence divide the string into words. Make a program WordCount
that is called with two parameters:
WordCount inputFile outputFile
The program should read the input file and produce an output file where each line contains a
word found in the input file together with the number of occurrences, for example, peter 3.
The words should appears in alphabetic order and the program should not distinguish between
small and capital letters.
10.2 The HTML elements <pre> . . . </pre> encloses a pre-formatted part of the web-page. This
part is displayed exactly as written, including spaces and line breaks, but each line should, of
course, be encoded in HTML encoding. Parts of this text can be copied using copy-paste when
the page is displayed using a web-browser. Make a program with program call
examplePage fileName.txt
that inputs the contents of the text file fileName.txt and produces a web-page fileName.html
containing the contents of this file as preformatted text.
10.3 This exercise is a continuation of Exercise 10.1.
1. We do not consider the hyphen character - a proper character in a word. Make a function
to capture the list of words in a string while removing any hyphen character.
2. Make a function of type string -> (string list)*(string option) removing
hyphen characters like the previous one, but treating the last word in the line in a special
way: we get the result
([word0 ; . . . ;wordn1 ],None)
if the last word in the string does not end with a hyphen character, and
([word0 ; . . . ;wordn2 ], Some wordn1 )
if the last word terminates with a hyphen character.
Make a new version of the WordCount program in Exercise 10.1 that in general ignores hy-
phen characters but handles words that are divided from a text line to the next by means of a
hyphen character.
10.4 A position on Earth is given by geographic longitude and latitude written in the form:
14 2735.03" E 55 1347" N
containing degrees, minutes (1/60 degrees) and seconds (1/60 minutes) where the letters E or
W denote positive or negative sign on a longitude while N or S denote positive or negative sign
on an latitude. The seconds (here: 35.03) may have a decimal point followed by decimals or
it may just be an integer. The unit symbols ( and and ") are assumed to consist of one or
several non-digit and non-letter characters. Make a program to capture a position from a string
as a value of type float*float.
10.5 Make alternative versions of the programs IndexGen and WebCatGen in the keyword in-
dex problem, where WebCat and KeyWdSet are represented using the imperative collec-
tions Dictionary and SortedSet (see Section 8.11). The programs should use iter
functions to build these imperative collections. The files keywords.txt, webCat0.txt,
webCat1.txt and webCat2.txt can be found on the web-page of the book.
11
Sequences
A sequence is a possibly infinite, ordered collection of elements seq [e0 ; e1 ; . . .]. The ele-
ments of a sequence are computed on demand only, as it would make no sense to actually
compute an infinite sequence. Thus, at any stage in a computation, just a finite portion of the
sequence has been computed.
The notion of a sequence provides a useful abstraction in a variety of applications where
you are dealing with elements that should be processed one after the other. Sequences are
supported by the collection library of F# and many of the library functions on lists presented
in Chapter 5 have similar sequence variants. Furthermore, sequences can be defined in F#
using sequence expressions, defining a process for generating the elements.
The type seq<a> is a synonym for the .NET type IEnumerable<a> and any .NET
framework type that implements this interface can be used as a sequence. One consequence
of this is, for the F# language, that lists and arrays (that are specializations of sequences)
can be used as sequence arguments for the functions in the Seq library. Another conse-
quence is that results from the Language-Integrated Query or LINQ component of the .NET
framework can be viewed as F# sequences. LINQ gives query support for different kinds of
sources like SQL databases and XML repositories. We shall exploit this in connection with
a database for a simple product-register application, where we shall introduce the concept
type provider that makes it possible to work with values from external data sources (like SQL
databases) in a type safe manner, and the concept of query expressions that gives support for
expressing queries on SQL databases in F#.
Sequence expressions and query expressions are special kinds of computation expressions
(also called workflows), a concept of F# that will be addressed in Chapter 12.
and we obtain a value of type seq<int>. This simple example works just like a list.
251
252 Sequences
To study the special features of sequences, we will consider infinite sequences. An infinite
sequence can be obtained by using the library function:
Seq.initInfinite: (int -> a) -> seq<a>
and the ith element in the sequence nat, when numbering starts with 0, is obtained using
the function Seq.nth: int -> seq<a> -> a. For example:
Seq.nth 5 nat;;
val it : int = 5
So far just the fifth element of nat is computed this is the only element demanded.
To study the consequences of such on demand computation we modify the example so that
the number is printed whenever it is demanded, using the printfn function (Section 10.7).
Evaluation of the expression printfn "%d" i has the side-effect that the integer i is
printed on the console, for example:
printfn "%d" 10;;
10
val it : unit = ()
The natural number sequence with print of demanded elements is declared as follows:
let idWithPrint i = printfn "%d" i
i;;
val idWithPrint : int -> int
The function idWithPrint is the identity function on integers that has the side-effect
of printing the returned value, for example:
idWithPrint 5;;
5
val it : int = 5
Extracting the third and fifth elements of natWithPrint will print just those elements:
Seq.nth 3 natWithPrint;;
3
val it : int = 3
11.1 The sequence concept in F# 253
Seq.nth 5 natWithPrint;;
5
val it : int = 5
In particular, the elements 0, 1, 2, and 4 are not computed at all.
Extracting the third element again will result in a reprint of that element:
Seq.nth 5 natWithPrint;;
5
val it : int = 5
Thus the nth element of a sequence is recomputed each time it is demanded. If elements
are needed once only or if the re-computation is cheap or giving a desired side-effect, then
this is fine.
Cached sequences
A cached sequence can be used if a re-computation of elements (such as 5 above) is unde-
sirable. A cached sequence remembers the initial portion of the sequence that has already
been computed. This initial portion, also called a prefix, comprises the elements e0 , . . . , en ,
when en is the demanded element with the highest index n.
We will illustrate this notion of cached sequence using the previous example. A cached
sequence of natural numbers is obtained using the library function
Seq.cache: seq<a> -> seq<a>
and it is used as follows:
let natWithPrintCached = Seq.cache natWithPrint;;
val natWithPrintCached : seq<int>
Demanding the third element of this sequence will lead to a computation of the prefix
0, 1, 2, 3 as we can see from the output from the system:
Seq.nth 3 natWithPrintCached;;
0
1
2
3
val it : int = 3
Demanding the fifth element will extend the prefix to 0, 1, 2, 3, 4, 5 but just two elements
are computed and hence printed:
Seq.nth 5 natWithPrintCached;;
4
5
val it : int = 5
and there is no re-computation of cached elements:
Seq.nth 5 natWithPrintCached;;
val it : int = 5
254 Sequences
Operation
Meaning
empty: seq<a>, where
empty denotes the empty sequence
init: int -> (int -> a) -> seq<a>, where
init n f = seq [f (0); . . . ; f (n 1)]
initInfinite: (int -> a) -> seq<a>, where
initInfinite f = seq [f (0); f (1); . . .]
nth: int -> seq<a> -> a, where
nth i s = ei1
cache: seq<a> -> seq<a>, where
cache sq gives a cached sequence
append: seq<a> -> seq<a> -> seq<a>, where
append sq 1 sq 2 appends two sequences
skip: int -> seq<a> -> seq<a>, where
skip i s = seq [ei ; ei+1 ; . . .]
ofList: a list -> seq<a>, where
ofList [a0 ; . . . ; an1 ] = seq [a0 ; . . . ; an1 ]
toList: seq<a> -> a list, where
toList seq s = [a0 ; . . . ; an1 ] just defined for finite sequences
take: int -> seq<a> -> seq<a>, where
take n s = seq [e0 ; . . . ; en1 ] undefined when s has fewer than n elements
map: (a -> b) -> seq<a> -> seq<>, where
map f s = seq [f (e0 ); f (e1 ); . . .]
filter: (a -> bool) -> seq<a> -> seq<a>, where
filter p s = s where s is obtained from s by deletion of elements ei : p(ei ) = false
collect: (a -> seq<c>) -> seq<a> -> seq<c>, where
collect f s is obtained by concatenation of the sequences f ei for i = 1, 2, 3 . . .
Creating sequences
The empty sequence is denoted Seq.empty. A one element sequence, that is, a singleton
sequence, is created using Seq.singleton, and a finite sequence can be generated from
a list using Seq.ofList. For example:
Seq.empty;;
val it : seq<a> = seq []
Seq.singleton "abc";;
val it : seq<string> = seq ["abc"]
11.2 Some operations on sequences 255
The function Seq.init is used to generate a finite sequence. The value of Seq.init n f
is the sequence
seq [f (0); f (1); f (2); . . . ; f (n 1)]
having n elements. For example:
Seq.init 3 (fun i -> 2*i);;
val it : seq<int> = seq [0; 2; 4]
Appending sequences
The operation Seq.append appending sequences works for finite sequences like @ on lists,
except for the on demand property:
let s1 = Seq.append (seq [1;2;3;4]) (seq [5;6]);;
val s1 : seq<int>
Seq.toList s1;;
val it : int list = [1; 2; 3; 4; 5; 6]
where the computation of the elements of s1 is delayed until they are demanded by the
conversion of the sequence to a list using Seq.toList.
If s denotes an infinite sequence, then Seq.append s s is equal to s for any sequence
s . For example:
let s2 = Seq.append nat s1;;
val s2 : seq<int>
For example:
let s3 = Seq.append s1 nat;;
(Seq.nth 2 s3 = Seq.nth 2 s1)
&& (Seq.nth 10 s3 = Seq.nth (10-6) nat);;
val it : bool = true
since s1 has six elements.
256 Sequences
There is no function in the Seq library that directly corresponds to the function for consing
an element x to a list xs , that is, x :: xs. But a cons function for sequence is easily defined
using Seq.singleton and Seq.append:
let cons x sq = Seq.append (Seq.singleton x) sq;;
val cons : a -> seq<a> -> seq<a>
gives a representation of the sequence where the element remain unevaluated. The element
is evaluated when the function is applied:
11.3 Delays, recursion and side-effects 257
sf();;
1
val it : seq<int> = [1]
Seq.nth 0 s1;;
1
val it : int = 1
There are two ways of using Seq.delay to modify the above recursive declaration of
from in order to avoid the problem with non-terminating evaluations. Either the recursive
call of from can be delayed:
let rec from1 i = cons i (Seq.delay (fun () -> from1(i+1)));;
val from1 : int -> seq<int>
Seq.nth 5 nat10;;
val it : int = 15
Seq.nth 5 nat15;;
val it : int = 20
There is a difference between these results of turning from into a function that generates
lazy sequences. This difference becomes visible in the presence of the side-effect of printing
the computed numbers using the idWithPrint function:
let rec fromWithPrint1 i =
cons (idWithPrint i)
(Seq.delay (fun () -> fromWithPrint1(i+1)));;
val fromWithPrint1 : int -> seq<int>
Seq.nth 3 nat10b;;
10
11
12
13
val it : int = 13
The function that removes all multiples of a number from a sequence is called sift, and
it is declared using the filter function for sequences:
sift 2 it;;
val it : seq<int> = seq [3; 5; 7; 9; ...]
sift 3 it;;
val it : seq<int> = seq [5; 7; 11; 13; ...]
The head and tail of a sequence are extracted in the above declaration by the functions
Seq.nth 0 and Seq.skip 1, respectively. See also Table 11.2.
The function that finds the nth prime number is declared as follows:
The 5th and 100th prime numbers are fast to compute (remember that numbering starts
at 0); but it requires some seconds to compute the 700th prime number:
nthPrime 5;;
val it : int = 13
nthPrime 100;;
val it : int = 547
nthPrime 700;;
val it : int = 5281
260 Sequences
A re-computation of the 700th prime number takes the same time as before, and a com-
putations of the 705th prime number would take approximately the same time. A use of a
cached prime number sequence will improve on that:
let primesCached = Seq.cache primes;;
This function should be iterated over a sequence and to this end two auxiliary functions
are defined:
let rec iter f x = function
| 0 -> x
| n -> iter f (f x) (n-1);;
val iter : (a -> a) -> a -> int -> a
The Newton-Raphson sequence for 2 that starts at 1.0 is:
The following function traverses trough a sequence sq using an enumerator (see Sec-
tion 8.12) and returns the first element where the distance to the next is within a given
tolerance eps :
The sequence argument sq is assumed to be infinite and the enumerator f will always
return some value, which is exploited by the function nextVal. The square roots of a can
be computed according to Newton-Raphsons method within tolerance 106 .
sRoot 2.0;;
val it : float = 1.414213562
This example illustrates the expressiveness of infinite sequences, but the above programs
can definitely be improved, in particular when considering efficiency issues. See, for exam-
ple, Exercise 11.5 and Exercise 11.6. But notice that none of these sequence-based solutions
is as efficient as a solution that just remembers the last computed approximation and termi-
nates when the next approximation is within the given tolerance.
262 Sequences
Seq.nth 5 s10;;
val it : int = 15
11.6 Sequence expressions 263
Construct Legend
yield exp generate element
yield! exp generate sequence
seqexp 1 combination of two sequences
seqexp 2 by appending them
let pat = exp local declaration
seqexp
for pat in exp do seqexp iteration
if exp then seqexp filter
if exp then seqexp else seqexp conditional
Table 11.2 Constructs for sequence expressions
The use of sequence expressions gives, in this case, a more verbose alternative to the
succinct formulation using Seq.filter in Section 11.4.
The use of sequence expressions in the declaration of sieve sq is attractive since the
explicit delay of the recursive call can be avoided and the combination sequence expressions
is a brief alternative to using Seq.append:
The function
Seq.collect: (a -> seq<c>) -> seq<a> -> seq<c>
combines the map and concatenate functionality. The value of Seq.collect f sq is ob-
tained (lazily) by applying f to each element ei of the sequence sq . The result is obtained
by concatenation of all the sequences f (ei ), for i = 0, 1, 2, . . ..
Hence the expression
Seq.collect allFiles (Directory.GetDirectories dir)
recursively extracts all files in sub-directories of dir and concatenates the results.
The sequence of all files in the directory C:\mrh\Forskning\Cambridge\, for
example, can be extracted as follows:
// set current directory
Directory.SetCurrentDirectory @"C:\mrh\Forskning\Cambridge\";;
The path matches a sequence of characters ending with a backslash \, that is, it matches
the regular expression \S*\\.
The file name matches a non-empty sequence of characters not containing backslash \,
that is, it matches the regular expression [\\]+.
11.6 Sequence expressions 265
Suppose that reExts is a regular expression matching certain file extensions. Then the
following regular expression (where the string is obtained by joining several strings) matches
composite file names having one of these file extensions:
Note that the extension is the suffix of the composite file name that starts just after the last
period (.). The regular expression has three capturing groups (enclosed in normal brackets
( and )) so that the path, the name and the extension can be extracted from a matched
composite file name. The function captureSingle from the TextProcessing li-
brary (cf. Table 10.4) is used to extract captured strings in the following declaration of
searchFiles les exts that gives those files in the sequence les that have an extension
in the list exts :
open System.Text.RegularExpressions;;
let funFiles =
Seq.cache (searchFiles (allFiles ".") ["fs";"fsi"]);;
val funFiles : seq<string * string * string>
In this case a cached sequence is chosen so that a search can exploit the already computed
part of the sequence from previous searches:
Seq.nth 6 funFiles;;
val it : string * string * string = (".\BOOK\","Curve","fsi")
Seq.nth 11 funFiles;;
val it : string * string * string
= (".\BOOK\", "Satisfiability", "fs")
266 Sequences
Range expressions
In a construction of a sequence constant, like
seq [1; 2; 3; 4];;
val it : seq<int> = [1; 2; 3; 4]
the sequence builder seq is actually a built-in function on sequences:
seq;;
val it : (seq<a> -> seq<a>) = <fun:clo@16-2>
and when applying seq to the list in the above example, we are just exploiting that lists are
specializations of sequences.
Hence the range expressions [b .. e] and [b .. s .. e] for lists described in Section 4.2
can, in a natural manner, be used to generate finite sequences:
let evenUpTo n = seq [0..2.. n];;
val evenUpTo : int -> seq<int>
11.8 Type providers and databases 267
Part: PartsList:
PartId PartName IsBasic PartsListId PartId Quantity
0 Part0 1 2 0 5
1 Part1 1 2 1 4
2 Part2 0 3 1 3
3 Part3 0 3 2 4
Figure 11.1 A Product Register Database with two tables: Part and PartsList
The table Part stores information about four parts named: Part0, . . . ,Part3,
where Part0 and Part1 are basic parts (with empty parts lists) since their IsBasic
attribute is 1 (the SQL representation of true), while Part2 and Part3 are composite
parts since their IsBasic attribute is 0 (representing false). The PartId attribute of the
Part table is a unique identifier, that is, a key, for the description of a part.
268 Sequences
The table PartsList contains the parts lists for all the composite parts. The attribute
pair (PartsListId, PartId) is a composite key. A row (pid, id, q) in this table, there-
fore, describes that exactly q pieces of the part identified by id is required in the parts list of
the composite part identified by pid. For example, the parts list for Part3 comprises 3
pieces of Part1 and 4 pieces of Part2.
Starting on Page 274 it is shown how this database can be created and updated. But before
that we address the issue of making queries to this database from F#.
let db = schema.GetDataContext();;
val db:
schema.ServiceTypes.SimpleDataContextTypes.ProductRegister
The SQL type provider: SqlDataConnection has a connection string as argument.
This connection string has three parts: A definition of the data source, in this case an SQL
#if INTERACTIVE
#r "FSharp.Data.TypeProviders.dll"
#r "System.Data.dll"
#r "System.Data.Linq.dll"
#endif
open System
open Microsoft.FSharp.Data.TypeProviders
open System.Data.Linq.SqlClient
open System.Linq
type schema =
SqlDataConnection<"Data Source=IMM-NBMRH\SQLEXPRESS;
Initial Catalog=ProductRegister;
Integrated Security=True">;;
let db = schema.GetDataContext();;
server, the initial catalog, in this case the database name, and the integrated security, which
in this case is true, meaning that the .NET credentials of the current user will be used for
authentication.
The type schema contains all the generated types that represent the database and db is
an object containing the database tables. The two database tables can be accessed as follows:
The answers from the F# system do not reveal the F# values of these two tables.
They are in fact lazy sequences. For example:
partTable;;
val it : Data.Linq.Table<schema.ServiceTypes.Part> =
seq [Part {IsBasic = true; PartId = 0; PartName = "Part0";};
Part {IsBasic = true; PartId = 1; PartName = "Part1";};
Part {IsBasic = false; PartId = 2;PartName = "Part2";};
Part {IsBasic = false; PartId = 3;PartName = "Part3";}]
where all the elements are shown in this case just because the interactive environment always
prints a short prefix of a lazy sequence. The elements of this sequence are objects belonging
to a class Part that has the attributes of the table as public fields:
r.PartId;;
val it : int = 2
r.PartName;;
val it : string = "Part2"
r.IsBasic;;
val it : bool = false
Note that the SQL types bit and varchar(50) are translated to the F# types bool and
string, respectively, by the type provider.
The list of F# elements of the PartsList table is obtained as follows:
Seq.toList partsListTable;;
val it : schema.ServiceTypes.PartsList list =
[PartsList {PartId = 0; PartsListId = 2; Quantity = 5;};
PartsList {PartId = 1; PartsListId = 2; Quantity = 4;};
PartsList {PartId = 1; PartsListId = 3; Quantity = 3;};
PartsList {PartId = 2; PartsListId = 3; Quantity = 4;}]
270 Sequences
Database queries can be expressed using the functions from the sequence library since the
tables in the database can be accessed like sequences when using the above type provider.
The names of all composite parts can, for example, be extracted as follows:
Seq.fold
(fun ns (r:schema.ServiceTypes.Part)
-> if r.IsBasic then ns else r.PartName::ns)
[]
partTable;;
val it : string list = ["Part3"; "Part2"]
Query expressions
We shall now introduce query expressions as means for extracting information from the
database. A query expression is a computation expression (just like sequence expressions)
and it occurs in expressions of the form:
query { queryexp }
The construct select v adds the element v to the answer to the query just like yield v
adds an element to a sequence:
The value of this query expression has type IQueryable<int * string>. The type
IQueryable<T> is a specialization of IEnumerable<T> and, therefore, values of type
IQueryable<T> can be treated as sequences.
There is a rich collection of query-expression constructs that translates to SQL queries.
We will now introduce a small part these constructs by illustrating how the following oper-
ations of relational algebra can be expressed: projection, selection and join.
Projection
A projection operation extracts certain columns of a table and such a projection can be
expressed using an iteration.
For example, a query for the projection of the Part table with respect to PartName and
IsBasic is declared as follows:
q1;;
val it : IQueryable<string * bool> =
seq [("Part0", true); ("Part1", true);
("Part2", false); ("Part3", false)]
11.8 Type providers and databases 271
Selection
A selection operation extracts certain rows of a table and such a selection can be expressed
using an iteration together with a where-clause and a selection.
For example, the query selecting the composite parts from the Part table is declared by:
let q2 =
query {for part in db.Part do
where (not part.IsBasic)
select (part.PartId, part.PartName, part.IsBasic)};;
q2;;
val it : IQueryable<int * string * bool> =
seq [(2, "Part2", false); (3, "Part3", false)]
Join
A join operation combines the rows of two tables A and B . There are many different kinds
of such combinations that are supported by SQL and query expressions. We shall here just
consider what is called an equi-join, where a row a A is combined with a row b B only
if a.LA = b.LB , where LA is a given attribute of A and LB is a given attribute of B .
By an equi-join of PartsList and Part tables with PartsListId of PartsList
equal to PartId of Part we can extract tuples from PartsList where identifiers for
parts list are replaced by their names:
Hence Part2 is a composite part consisting of 5 pieces of the part with PartId equal to
0 and 4 pieces of the part with PartId equal to 1. By the use of nested joins we can make
a query where these identifiers also are replaced by their names. The result of q4 cannot
be used in a join since the elements have a tuple type and not a record type. We therefore
introduce a record type:
type partListElement =
{PartName:string; PartId:int; Quantity:int}
In the following nested join, the local query qa is the variant of q3 that gives elements of
type partListElement:
272 Sequences
let q4 =
query {let qa = query {for pl in db.PartsList do
join part in db.Part on
(pl.PartsListId = part.PartId)
select {PartName = part.PartName;
PartId = pl.PartId;
Quantity = pl.Quantity} }
for pl in qa do
join part in db.Part on
(pl.PartId = part.PartId)
select(pl.PartName, part.PartName, pl.Quantity) };;
q4;;
val it : IQueryable<string * string * int> =
seq
[("Part2", "Part0", 5); ("Part2", "Part1", 4);
("Part3", "Part1", 3); ("Part3", "Part2", 4)]
Aggregate operations
In SQL there are so-called aggregate operations that depend on a whole table or all the
values in a column of a table, such as counting the number of elements in a table or finding
the average of the elements in a column. There are also query-expression constructs for
these functions, for example, count that counts the number of elements selected so far,
exactlyOne that returns the single element selected, and raises an exception if no element
or more than one element have been selected, and contains v that checks whether v is
among the so far selected elements.
The following function counts the number of rows in Part. Since we shall use consec-
utive numbers 0, 1, . . . , n 1 as identifiers for existing parts, the number of rows n is the
next identifier that can be used as a key. This function is therefore named nextID:
let nextId() = query {for part in db.Part do
count };;
val nextId : unit -> int
The function getDesc extracts the description of a given identifier
let getDesc id =
query {for part in db.Part do
where (part.PartId=id)
select (part.PartName,part.IsBasic)
exactlyOne };;
val getDesc : int -> string * bool
where the description consists of the name and truth values of the Name and IsBasic
attributes. For example:
nextId();;
val it : int = 4
getDesc 3;;
val it : string * bool = ("Part3", false)
11.8 Type providers and databases 273
getDesc 4;;
System.InvalidOperationException: Sequence contains no elements
The predicate containsPartId checks whether a given identifier is in the Part table:
let containsPartId id = query {for part in db.Part do
select part.PartId
contains id };;
val containsPartId : int -> bool
containsPartId 3;;
val it : bool = true
containsPartId 4;;
val it : bool = false
getPartsList 3;;
val it : IQueryable<int * int> = seq [(1, 3); (2, 4)]
We shall need functions for adding a pair (id, k) to given parts list, for merging two
parts lists and for multiplying all quantities in a parts list by a constant. These functions are
usual auxiliary list functions:
let rec add pl (id,q) =
match pl with
| [] -> [(id,q)]
| (id1,q1)::pl1 when id=id1 -> (id,q+q1)::pl1
| idq::pl1 -> idq :: add pl1 (id,q);;
val add : (a * int) list -> a * int -> (a * int) list
when a : equality
let mergePartsList pl1 pl2 = List.fold add pl1 pl2;;
val mergePartsList :
(a * int) list -> (a * int) list -> (a * int) list
when a : equality
The following function partBreakDown that computes a parts list containing all basic
parts needed for producing a given part is declared in mutual recursion with the function
partsListBreakDown that computes a parts list containing all basic parts needed for
producing a given parts list. These functions access the database to extract the description
and the parts list of a given part using getDesc and getPartsList.
partBreakDown 3;;
val it : (int * int) list = [(1, 19); (0, 20)]
partBreakDown 1;;
val it : (int * int) list = [(1, 1)]
Creating a database
Executing the F# program in Figure 11.3 will setup the ProductRegister database with
empty Part and PartList tables.
Updating a database
The type scheme contains service types and constructors for elements of the tables in the
database. For example
generates a new part object that can belong to the Part table.
Table objects like db.Part and db.PartsList have members InsertOnSubmit
and InsertAllOnSubmit that you can give a single row and a collection of rows, respec-
tively, to be inserted in the tables. These insertions are effectuated only when the function
SubmitChanges from the LINQ DataContext type has been applied.
Consider for example the following function that inserts a basic part into the Part table
given its part name:
11.8 Type providers and databases 275
open System.Configuration
open System.Data
open System.Data.SqlClient
conn.Open();;
let addBasic s =
let id = nextId()
let part = new schema.ServiceTypes.Part(PartId = id,
PartName = s,
IsBasic = true)
db.Part.InsertOnSubmit(part)
db.DataContext.SubmitChanges()
Some id;;
val addBasic : string -> int option
The function generates a key for the part and this key is returned by the function.
The insertion of a composite part into the database is based on its name s and its parts
list: [(id1 , k1 ), . . . (idn , kn )]. Such an insertion is only meaningful when the identifiers idi ,
for 1 i n, are already defined in the Part table and when all the quantities ki ,
for 1 i n, are positive integers. This well-formedness constraint is checked by the
following function:
let isWellFormed pl =
List.forall (fun (id,k) -> containsPartId id && k>0) pl;;
val isWellFormed : (int * int) list -> bool
276 Sequences
If this well-formedness constraint is satisfied, then the following function inserts a new com-
posite part into the Part table and its parts list into the PartsList table:
let addComposite s pl =
if isWellFormed pl
then
let id = nextId()
let part =
new schema.ServiceTypes.Part(PartId=id,
PartName=s,
IsBasic = false)
let partslist =
List.map
(fun (pid,k) ->
new schema.ServiceTypes.PartsList(PartsListId=id,
PartId=pid,
Quantity=k))
pl
db.Part.InsertOnSubmit(part)
db.PartsList.InsertAllOnSubmit(partslist)
db.DataContext.SubmitChanges()
Some id
else None;;
val addComposite : string -> (int * int) list -> int option
The tables in Figure 11.1 are generated from an initial ProductRegister database with
two empty tables by evaluation of the following declarations:
let id0 = Option.get (addBasic "Part0");;
val id0 : int = 0
let id2 =
Option.get (addComposite "Part2" [(id0,5);(id1,4)]);;
val id2 : int = 2
let id3 =
Option.get (addComposite "Part3" [(id1,3);(id2,4)]);;
val id3 : int = 3
Exercises 277
Summary
This chapter has introduced the notion of sequence, which is an ordered, possibly infinite,
collection of elements where the computation of elements is on demand only. Sequences are
convenient to use in applications where you are dealing with elements that are processed
one after they other. Functions from the sequence part of the collection library of F# have
been introduced together with cached sequences that prevents a recomputation of already
computed sequence elements. Furthermore, sequences can be defined in F# using sequence
expressions defining a step-by-step process for generating the elements.
The type seq<a> is a synonym for the .NET type IEnumerable<a> and any .NET
framework type that implements this interface can be used as a sequence. This has been
studied in connection with the Language-Integrated Query or LINQ component of the .NET
framework. LINQ gives query support for different kinds of data sources like SQL databases
and XML repositories. We have used LINQ in connection with a database for a simple
product-register application, where an F# type provider made it possible to work with values
from the external data sources (an SQL databases in this case) in a type safe manner. The
concept of query expressions was introduced since it gives powerful support for expressing
queries on SQL databases in F#.
Exercises
11.1 Make a declaration for the sequence of odd numbers.
11.2 Make a declaration for the sequence of numbers 1, 1, 2, 6, . . . , n!, . . ..
11.3 Make a declaration for the sequence of seq [1; 1; 2; 6; . . . ; n!; . . .], where the i + 1st element is
generated from the i th element by multiplication with i + 1.
11.4 Declare a function that, for given i and n, selects the sublist [ai ; ai+1 ; . . . ; ai+n1 ] of a se-
quence seq [a0 ; a1 ; . . .].
11.5 The declaration of the function iterate f on Page 260 has the drawback that f n x is com-
puted when the n th element is demanded. Give an alternative declaration of this function using
the property that the n + 1st element of the sequence can be computed from the nth element
by an application of f .
11.6 Have a look at the unfold function from the Seq library. Make a declaration of the sRoot
function from Section 11.5 using Seq.unfold. That declaration should be based on the idea
that the sequence generation is stopped when the desired tolerance is reached. Measure the
possible performance gains.
11.7 The exponential functions can be approximated using the Taylors series:
1 x1 xk
ex = + + + + (11.2)
0! 1! k!
1. Declare a function that for a given x can generate the sequence of summands in (11.2). Hint:
Notice that the next summand can be generated from the previous one.
2. Declare a function that accumulates the elements of a sequence of floats. I.e. given a sequence
seq [x0 ; x1 ; x2 ; . . .] it generates the sequence seq [x0 ; x0 + x1 ; x0 + x1 + x2 ; . . .].
3. Declare a function to generate the sequence of approximations for the function ex on the
basis of (11.2).
4. Declare a function to approximate ex within a given tolerance.
278 Sequences
11.8 The Madhava-Leibniz series (also called Gregory-Leibniz series) for is:
(1)n
=4
2n + 1
n=0
Use this series to approximate . (Note that there are other series for , which converge much
faster than the above one.)
11.9 Declare a sequence denoting the following enumeration of the integers:
0, 1, 1, 2, 2, 3, 3, . . .
11.10 Use the functions in the Seq library to declare a function cartesian sqx sqy that gives a
sequence containing all pairs (x, y) where x is a member of sqx and y is a member of sqy .
Make an alternative declaration using sequence expressions.
11.11 Solve Exercise 11.3 using sequence expressions.
11.12 Solve Exercise 11.7 using sequence expressions.
11.13 Solve Exercise 11.8 using sequence expressions.
11.14 Solve Exercise 11.9 using sequence expressions.
11.15 Give a database-based solution to the cash-register example introduced in Section 4.6.
11.16 Give a database-based solution to Exercise 4.23.
12
Computation expressions
279
280 Computation expressions
The meaning of the actual for construct can hence be expressed using Seq.Collect:
seq { for i in seq [1 .. 3] do ce(i) }
= Seq.collect f (seq [1 .. 3])
where:
f = fun i -> seq { ce(i) }
where For defines the meaning of the for construct and Yield defines the meaning of the
yield construct, in the sense of the translations shown in Table 12.1.
It was shown above that the for construct can be expressed using Seq.collect. Fur-
thermore, the function Yield, with the type a -> mySeq<a> lifts an element to a
sequence and, therefore, Yield a returns the singleton sequence just containing a. Hence,
we arrive at the definitions:
For(sq, f ) = Seq.collect f sq (12.1)
Yield(a) = Seq.singleton a (12.2)
The machinery is illustrated in Figure 12.1 on the pairRecipe example, where an el-
lipse represents a sequence and a dashed box represent a sequence obtained by concatenating
the contained sequences. The figure illustrates the meaning of the for construct, where f
is applied to each element of i sq . Each application f (i) contribute with a part of the
resulting sequence where the result is obtained by concatenation of the sequences:
f (1), f (2), f (3)
12.3 The basic functions: For and Yield 283
sq f
For(sq ,f ) = Seq.collect f sq
type MySeqClass() =
member bld.Yield a: mySeq<a> = Seq.singleton a
member bld.For(sq:mySeq<a>, f:a -> mySeq<b>):mySeq<b>
= Seq.collect f sq;;
We can now make our own computation expressions (limited to the for and yield con-
structs), for example, to declare a function that makes the Cartesian product of two se-
quences sqx and sqy by constructing a sequence containing all pairs (x, y) where x is a
member of sqx and y is a member of sqy :
A declaration based on recursive functions or on the functions in the Seq library would not
have a comparable simplicity. Try, for example, Exercise 11.10.
Using the translation in Table 12.1, this declaration is, behind the scene, translated to:
We shall not go into further details here concerning how mySeq can be extended to
capture more facilities of sequence expressions. Sequence expressions are handled in F# by
a direct translation to composition of functions from the Seq-library and not by the use of a
builder class. Further information can be found in the on-line documentation of F#.
284 Computation expressions
Hence, the member function Delay provides an possibility to impose a delay from the very
start of a computation expression. We shall have a closer look at this in Section 12.8.
where
None in case of errors, and
I e env =
Some v otherwise, where v is the result of evaluating e in env .
286 Computation expressions
Observe that the tags None and Some are absent from this program and that the declarations
focus just on the computation of the value of an expression.
For example:
let e1 = Add(Div(Num 1, Num 0), Num 2);;
let e2 = Add(Add(Var "x", Var "y"), Num 2);;
let v1 = I e1 env;;
val v1 : maybe<int> = None
let v2 = I e2 env;;
val v2 : maybe<int> = Some 5
The examples show that the maybe computation expressions eagerly evaluate e1 and e2
to values of type maybe<int>. Hence the computation expressions for this simple version
of the class MaybeClass are not real recipes, they actually correspond to cooked dishes.
This deficiency is repaired in the next section.
let! x = m in ce
matches the following operational reading:
1. Start the computation m.
2. Bind x to the value a of this computation if it terminates properly with a in the container.
3. Use this binding in the recipe ce .
Notice that the let! construct translates to Bind(m, f ) where f is fun x -> T(ce).
These considerations lead to the first revised version of the builder class:
type MaybeClass() =
member bld.Bind(m:maybe<a>, f:a->maybe<b>):maybe<b> =
match start m with
| None -> delay None
| Some a -> f a
member bld.Return a:maybe<a> = delay(Some a)
member bld.ReturnFrom v:maybe<a> = delay v
member bld.Zero():maybe<a> = delay None;;
let v2 = I e2 env;;
val v2 : maybe<int>
The recipes v1 and v2 must be started to get values computed:
start v1;;
val it : int option = None
start v2;;
val it : int option = Some 5
Since an expression like maybe { let! x = m ... } translates to Bind(m,...) the
computation m will actually be started (check the declaration of Bind) and the values v1
and v2 contain in this sense partly cooked ingredients. This can be observed if side effects
are introduced into, for example, in the clause where addition is treated:
...
| Add(e1,e2) ->
maybe {let! v1 = eval e1
let! v2 = eval e2
return (printfn "v1: %i v2: %i" v1 v2 ; v1+v2)}
...
290 Computation expressions
The result of executing the following declarations with this version of maybe:
let v2 = I e2 env;;
v1: 1 v2: 2
v1: 3 v2: 2
val v2 : maybe<int>
start v2;;
val it : int option = Some 5
shows that the computation is started and active until the outermost return or return!
statement is reached.
The translation of an expression comp{ce } will then use this delay function in the transla-
tion of a computation expression ce :
This gives a possibility to enforce a delay from the very start of a computation expression:
We add the following declaration of Delay to the MaybeClass declaration in the previous
section:
type MaybeClass() =
... As above from Bind to Zero ...
member bld.Delay f:maybe<a> = fun () -> start (f());;
The effect of this can be observed using the above side-effect example, where the printing
of the two lines with values to be added move from the declaration of v2 to its activation:
let v2 = I e2 env;;
val v2 : maybe<a>
start v2;;
v1: 1 v2: 2
v1: 3 v2: 2
val it : int option = Some 5
Hence, with the introduction of the Delay member in the class declaration, the expressions
of the form maybe{ce } will denote genuine recipes. These recipes are expressed in an
operational manner by describing how to cook the dish; but the actual cooking is delayed
until the recipe is started.
12.9 The fundamental properties of For and Yield, Bind and Return 291
Yield a
f
a b0 b1 b2 bn
sq Yield
a0 a0
Yield
a1 a1
.. .. ..
. . .
For(sq ,Yield) = sq
12.9 The fundamental properties of For and Yield, Bind and Return
When declaring builder classes like MySeqClass and MaybeClass the only restriction
imposed on For and Yield and Bind and Return is that they should have the correct
types. But there are some laws that meaningful implementations should obey. These laws
originate from the theory of monads for functional programming, a theory that provides the
mathematical foundation for computation expressions.
The intuition behind these laws will be presented using the builder class for mySeq<a>
as example, where values of this type are considered as containers for values of type a
and computation expressions are recipes for filling containers. But the laws are not biased
towards the mySeq computation expression builder.
..
.. . ..
. .
f (a0 ) For(f (a0 ),g )
f g
.. ..
.. . .. . ..
. . .
f (a1 ) For(f (a1 ),g )
..
.. . ..
. .
For(sq , fun a -> For(f (a),g ))
Figure 12.4 The law: For(For(sq,f ),g) = For(sq, fun a -> For(f (a),g))
This law is explained in terms of the for construct. Observe first that the computation
expression for a in sq do f (a) translates as follows:
Using this technique we arrive at the following alternative formulation of the law:
The law expresses two ways of filling a container. The left-hand-side way is by filling it
using g(b) where b is in the container obtained from for a in sq do f (a)). The right-
hand-side way is by filling it using g(bij ) where bij is in the container obtained from f (ai ),
where ai is in the container sq . This is illustrated in Figure 12.4.
12.10 Monadic parsers 293
type MyStrangeSeqClass() =
member bld.Return a: myStrangeSeq<a> =
Seq.singleton a
member bld.Bind(sqs, f):myStrangeSeq<b> =
Seq.collect f sqs;;
Seq.toList pairRecipe;;
val it : (int * char) list =
[(1, a); (1, b); (1, c); (1, d);
(2, a); (2, b); (2, c); (2, d);
(3, a); (3, b); (3, c); (3, d)]
Therefore, the fundamental laws Bind and Return are the same as those for For and
Yield:
It is left as an exercise to justify that these properties hold for the maybe example.
of type
-a1 + 2 * (a2 - 3)
Add (Neg (Var "a1"), Mul (Num 2, Sub (Var "a2", Num 3)))
of type
The captured value corresponds to the expression tree in Figure 12.5. This example has
a number of interesting features beside the recursion: two levels of operator precedence
(multiplication and addition operators), a precedence level with two operators (+ and -),
and use of the same operator symbol (-) with two different meanings as prefix and infix
operator.
12.10 Monadic parsers 295
Add
Neg Mul
"a2" 3
Grammars
In the first example we have two tokens name and number with regular expressions:
open System.Text.RegularExpressions;;
let nameReg = Regex @"\G\s*([a-zA-Z][a-zA-Z0-9]*)";;
let numberReg = Regex @"\G\s*([0-9]+)";;
where the tokens name and number denote the set of strings matching the corresponding
regular expressions nameReg and numberReg.
We shall capture structured data through rules described by context-free grammars. For
the person data, the rules should capture strings with the wanted syntax:
Each non-terminal symbol denotes a syntax class that is the set of all strings that can be
generated from that non-terminal symbol. A string is generated from a non-terminal symbol
by a derivation that repeatedly replaces a non-terminal symbol by a choice in its definition
or a token by a matching string. The derivation terminates when there are no more non-
terminals or tokens to be substituted.
For example, the string Peter 5 John belongs to the syntax class personData due to
the derivation:
personData
personList
person personList
name numberList personList
"Peter" numberList personList
"Peter" number numberList personList
"Peter 5" numberList personList
"Peter 5" personList
"Peter 5" person personList
"Peter 5" name numberList personList
"Peter 5 John" numberList personList
"Peter 5 John" personList
"Peter 5 John"
"Peter 5 John"
While a derivation generates a string from a non-terminal symbol, we are interested in the
parsing of a string, that is, the creation of a derivation on the basis of a given string. The
technique for monadic parsing to be presented will make this derivation on the basis of
recursive definitions following the structure of the grammar, and for this approach to be
well-defined the grammar must satisfy that there is no derivation of the form:
N Nw
where N is a non-terminal symbol and w is a sequence of tokens, non-terminal symbols and
strings. In particular, there should be no left-recursive rule of the form:
N ::= N w
in the grammar. See Exercises 12.3 and 12.4.
We shall use grammars written in EBNF notation (extended Backus Naur form) allowing,
for example, use of the repetition operator * that was introduced in connection with regular
expressions (cf. Section10.2). Using the EBNF notation the above grammar gets a more
compact form:
personData ::= person*
person ::= name number*
In the second example with expressions we have tokens num, var, addOp, mulOp,
sign, leftPar, rightPar, and eos where eos denotes end of string. The correspond-
ing regular expressions are as follows:
12.10 Monadic parsers 297
The cautious reader will observe that the addOp and sign tokens have the same regular
expressions. We will comment on this in the subsection on token parsers.
We choose syntax classes expr, term and factor to express the composite forms in expres-
sions and we get at the following grammar for expressions, where the rule for factor has a
number of different choices:
expr ::= term (addOp term)*
term ::= factor (mulOp factor)*
factor ::= num | var | sign factor | leftPar expr rightPar
Building a grammar of this kind requires careful considerations of the following syntactic
issues:
1. Multiplication operator *.
2. Addition operators + and -.
while all operators associate to the left. The precedence rules of operators are captured in
the grammar:
The left recursion appears in the rule where expr can be expanded to expr addOp term and
similar for term. This grammar has the (theoretical) advantage that the steps in the derivation
of an expression correspond to the steps in building the expression tree. Such grammars can,
however, not be used in our method because the corresponding parser will enter an infinite
loop. See Exercise 12.4.
Parsers
Each character in the input string is identified by its position that is an non-negative integer.
A parser with result type a scans the string searching for matches starting a specified start
position pos. A match is a pair (a ,pos ) of a captured value a of type a and the end
position pos where a succeeding parsing may take over. The parser is hence consuming
the characters from position pos up to (but not including) the end position pos in producing
the result a . The collection of all possible matches starting at a specified position can hence
be represented by a list:
[(a0 ,pos 0 ); (a1 ,pos 1 ); . . . ; (an1 ,pos n1 )]
This corresponds to the following type of a parser with result type a:
type parser<a> = string -> int -> (a * int) list;;
Note that we allow several possible matches. This is not a complication it is actually a
key feature in monadic parsers. An empty list indicates that the parser has failed to find
any matches. Suppose, for example, that we have a parser expr for algebraic expressions.
Parsing the input string "-a1 + 2 * (a2 - 3)" from position 0, using the expression
expr "-a1 + 2 * (a2 - 3)" 0, should then give a list with three matches:
[(Neg (Var "a1"), 3);
(Add (Neg (Var "a1"),Num 2), 7);
(Add (Neg (Var "a1"),Mul (Num 2,Sub (Var "a2",Num 3))),18)]
Position 3 is just after "-a1", position 7 is just after "2" while position 18 is at the end of
the string.
Token parsers
Tokens are parsed using token parsers. We consider two kinds of token parsers:
1. A token parser with captured data (normally to be converted).
2. A token parser without relevant captured data.
A token parser of the first kind is made using the token function. The regular expression
reg must contain a capturing group. The function conv converts the captured data:
open TextProcessing;;
Token parsers without captured data are made using the emptyToken function. The reg-
ular expression need not contain any capturing group and there are no conversion function.
The parser captures the dummy value () of type unit and its function is solely to recognize
and consume the data matching the regular expression:
let emptyToken (reg: Regex) : parser<unit> =
fun str pos ->
let ma = reg.Match(str,pos)
match ma.Success with
| false -> []
| _ -> let pos2 = pos + ma.Length
[( (), pos2)];;
val emptyToken : (Regex -> parser<unit>) = <fun:clo...>
Note that the function captureSingle from the TextProcessing library (cf. Ta-
ble 10.4 and Appendix B) is used in the above declarations.
The conversion function is the pre-defined identity function id for the name token parser
because the captured string should be used literally as is. The number token parser uses
the conversion function int to convert the captured string of digits to an integer. The token
parsers num, var, sign, addOp, mulOp, leftPar and rightPar in the second exam-
ple should give values that can be used directly in building the expression tree (like the tree
shown in Figure 12.5). The token parser addOp should hence capture a value that can be
used to join two sub-trees, for example:
fun x y -> Add(x,y)
of type:
Expr -> Expr -> Expr
when parsing the character +. The addOp token parser will hence be of type:
parser<Expr->Expr->Expr>
300 Computation expressions
p str pos
gf ,str
Figure 12.6 Illustrating Bind(p,f ) = fun str pos -> collect gf ,str (p str pos)
for the part of the input string that starts at position pos and ends at pos i 1. Application
of f to ai yields a parser that must be activated to the input string str and the start posi-
tion pos i for that parser. This resembles the definition of the for construct for sequences,
see Figure 12.1. This complete process is illustrated in Figure 12.6. Notice that the Bind
function takes care of all the data management concerning the positions.
These ideas lead to the following computation expression class and builder object:
type ParserClass() =
member t.Bind(p: parser<a>, f: a->parser<b>):parser<b> =
fun str pos ->
List.collect (fun (a,apos) -> f a str apos) (p str pos)
member bld.Zero() = (fun _ _ -> []): parser<a>
member bld.Return a = (fun str pos -> [(a,pos)]): parser<a>
member bld.ReturnFrom (p: parser<a>) = p;;
parser { return a }
captures the value a without consuming any characters, that is, it gives the value a at the
end position of the previously used parser.
When a string cannot be parsed the final result is the empty list; but no informative error
report is handled by the builder class. In Exercise 12.8 you are asked make new builder class
for parsers that takes care of a simple error handling.
302 Computation expressions
p1 : parser<a1 >
p2 : parser<a2 >
...
pn : parser<an >
and a function:
F : a1 * a2 * . . . * an -> b
for some types a 1 , a 2 , . . . , a n , b .
Any match of the sequenced parser starting from position pos is then obtained by getting
a sequence of contiguous matches (starting from position pos):
of the parser p1 , p2 , . . . , pn and applying the function F to the captured values to get a match
of the sequenced parser (starting from position pos):
parser { let! a1 = p1
let! a2 = p2
...
let! an = pn
return F(a1 ,a2 , . . . ,an ) }
The return expression is inserted at a place where activation of the parsers p1 , p2 , . . . , pn
have already consumed all relevant characters and where it only remains to return the value
F (a1 ,a2 , . . . ,an ) without consuming any further characters.
Sequencing of parsers is used when building parsers for fixed forms (containing no part
that is repeated an unspecified number of times). The simplest examples are parsers built
using the pairOf combinator:
We may, for instance, combine the name and number token parsers using pairOf:
In building a parser in the first example we will use the pairOf parser combinator to
combine parsers of a name and of a list of numbers.
One may define a tripleOf combinator in a similar way, but it is not of much use as
most grammars require specially built parsers for their sequencing constructs. In the expres-
sion example we have, for instance, the form of an expression enclosed in parentheses. A
simplified version of this is a form with a variable enclosed in parentheses like:
( abc)
A parser for this form can be obtained by sequencing the token parsers leftPar, var and
rightPar:
let varInPars = parser {let! _ = leftPar
let! x = var
let! _ = rightPar
return x };;
val varInPars : parser<expr>
let! x = p
and put this element in front of the remaining list and return the result:
let! xs = listOf p
return x::xs
The parser listOf number will for instance parse lists of numbers:
The infixL combinator is used when building a parser for a syntactic form where an
arbitrary number of operands are intermixed with infix operators that are on the same prece-
dence level and associates to the left.
As an example we consider strings like
a - b + 5
where numbers or variables are intermixed with addition operators (+ or -). A parser for
this form can be obtained using the below defined infixL operator:
Making parsers
The parsers in the examples are based directly on the token parsers and the EBNF grammar:
A parser is defined for each syntax class.
Each operator in a syntactic rule in the grammar is translated to a suitable parser combi-
nator.
One should, however, pay attention to the word suitable: The parser combinators should
not only correspond to the syntax but they must also give the right conversion of the textual
form to captured value. You will frequently have to write your own parsers for fixed sequence
constructs (like leftPar expr rightPar in the second example) but is it a good idea
to try to design the syntax and the structure (that is, type) of the captured value such that
repetitive constructs can be handled using the above parser combinators.
Making the parser in the first example is almost straightforward:
let person = pairOf name (listOf number);;
val person : parser<string * int list>
Reporting errors
A simple error reporting can be obtained by letting the token parsers update a global variable
maxPos. The declarations of token and emptyToken are then preceded by
let mutable maxPos = 0
let updateMaxPos pos = if pos > maxPos then maxPos <- pos;;
and an extra line is added to the token function
let token (reg: Regex) (conv: string -> a) : parser<a> =
fun str pos ->
let ma = reg.Match(str,pos)
match ma.Success with
| false -> []
| _ ->
let pos2 = pos + ma.Length
updateMaxPos pos2
[( conv(captureSingle ma 1), pos2)];;
and similarly for emptyToken.
Using this set-up we introduce the type ParseResult<a>
type ParseResult<a> = ParseOk of a | ParseError of int;;
in order to report an error when an input string cannot be parsed. In the case of such an
error, the global variable maxPos identifies the position where the error was detected and
this position is reported:
let parseString (p: parser<a>) (s: string) =
maxPos <- 0
match p s 0 with
| (a,_)::_ -> ParseOk a
| _ -> ParseError maxPos;;
val parseString : parser<a> -> string -> ParseResult<a>)
where the error in the last case was found at position 14 in the string.
In Exercise 12.8 you are asked to hide the error handling in the builder class for parsers.
Summary
This chapter has introduced the notion of computation expressions of F#. Computation ex-
pressions offer a possibility for using special syntactic constructs like let!, return, etc.
with a user-defined meaning through the declaration of so-called builder classes. This con-
cept is based on the theory of monads for functional programming introduced in connection
with the Haskell programming language.
The chapter uses sequence expressions (introduced in Chapter 11) and error handling
in connection with expression evaluation as examples to show how you may define your
own computation expressions. The last section shows how parsers can be constructed in a
convenient manner using computation expressions.
Asynchronous computations that will be introduced in Section 13.4 is an important ex-
ample of computation expressions.
Exercises
12.1 Consider the following alternative to the declaration for bld.Delay on Page 290:
type MaybeClass() =
...
member bld.Delay f:maybe<a> = delay(start (f()));;
This new declaration would not give the desired effect. Explain why.
12.2 Consider the expression evaluation on Page 287. Make a new class declaration for computation
expressions that takes care of the evaluation in the environment env and simplify the declara-
tion of the function I accordingly. Hint: Consider computations as functions having the type:
Map<string,a> -> option<a>.
12.3 The following grammar for non-empty lists of numbers uses left recursion:
has a problem. Analyze the parser and explain what the problem is.
12.4 Explain the problem with the grammar for expressions on Page 297 that uses left recursion.
310 Computation expressions
12.5 Consider the formulas of propositional logic introduced in Exercise 6.7. In the string represen-
tation of such formulas conjunction can be written either as & or as and, disjunction either
as | or as or and negation either as ! or as not. For example, the formula
(P (Q R))
This chapter is about programs where the dynamic allocation of computer resources like
processor time and memory becomes an issue. We consider two different kinds of programs
together with programming constructs to obtain the wanted management of computer re-
sources:
1. Asynchronous, reactive programs spending most of the wall-clock time awaiting a request
or a response from an external agent. A crucial problem for such a program is to minimize
the resource demand while the program is waiting.
2. Parallel programs exploiting the multi-core processor of the computer by performing
different parts of the computation concurrently on different cores.
The construction of asynchronous and parallel programs is based on the hardware features
in the computer and software features in system software as described in Sections 13.1 and
13.2. Section 13.3 addresses common challenges and pitfalls in parallel programming. Sec-
tion 13.4 describes the async computation expression and illustrates its use by some simple
examples. Section 13.5 describes how asynchronous computations can be used to make reac-
tive, asynchronous programs with a very low resource demand. Section 13.6 describes some
of the library functions for parallel programming and their use in achieving computations
executing concurrently on several cores.
311
312 Asynchronous and parallel computations
Core Core
Level 2 cache
6
?
Main memory
so program and data should fit into the cache unless there is an enormous amount of data
or some other program is using the cache.
The strategies used in managing the cache memories are outside the scope of this book,
but one should observe that all program activities on the computer are competing for cache.
Processes
A process is the operating system entity to manage an instance of execution of a free-
standing program. The process contains the program and the data of the program execution.
A process may comprise multiple threads of execution that execute instructions concurrently.
A double-click on an icon on the screen will usually start a process to run the program be-
longing to the icon.
A free-standing F# program comprises the Common Language Runtime System, CLR
(cf. Interoperating with C and COM in [13]). The Runtime System manages the memory
resources of the process using a stack for each thread and a common heap as described
in Chapter 9, and it manages the program execution using threads as described below. A
simplified drawing of the memory lay-out of such a process is shown in Figure 13.2.
The System.Diagnostics.Process library allows a program to start and manage
new processes. This topic is, however, outside the scope of the present book. The reader may
consult the Microsoft .NET documentation [9] for further information.
Threads
A thread is the .NET vehicle for program execution on one of the cores in the computer.
Each thread has its own memory stack and separate execution of instructions. In this chapter
we consider only threads managed via a thread pool where tasks containing programs can
be enlisted as work items. Such a task will be executed when a thread and a core become
available.
13.2 Processes, threads and tasks 313
User programs
Heap
Programs Data
There is a simple example on Page 314 showing creation and start of threads. The reader
may consult the description of the System.Threading library in [9] for further informa-
tion.
Tasks
A task is a piece of program that can be executed by a thread. When started, a task is enlisted
as a work item in the thread pool and it is then executed when a thread becomes available.
There are two essentially different ways of executing operations like I/O where the task has
to await the completion of the operation:
Synchronous operations: The operation is started and the executing thread awaits the com-
pletion of the operation. The thread continues executing the task when the operation has
completed. The standard library I/O functions are synchronous operations.
Asynchronous operations: The operation is started and the task becomes a wait item await-
ing the completion of the operation. The executing thread is returned to the thread pool to
be used by other tasks. The task is again enlisted as a work item when the operation has
completed. Asynchronous operations are found in the F# Async library and in extensions
to the standard I/O libraries.
The continuation of program execution after a synchronous operation is done using the stack
of the thread where the information is at hand. The mechanism is different in asynchronous
operations because there is no stack available while the task is waiting. The continuation of
the program execution is therefore saved in a special data structure when the asynchronous
operations is initiated, and this data structure is then later used to continue the task upon
completion of the operation.
A task waiting for completion of an asynchronous operation uses a small amount of mem-
ory and no threads and a process may hence run thousands of asynchronous tasks concur-
rently. The situation is quite different for synchronous tasks where the number is limited by
the number of threads available for the program.
These concepts can be illustrated using the cook-book metaphor of Section 12.1 where
a program is described as a recipe in a cook-book. A process is then described as a restau-
rant while threads are cooks and tasks are the customers orders. A synchronous operation
314 Asynchronous and parallel computations
corresponds to a cook focussing on the progress of a single order only, while asynchronous
operations corresponds to a cook switching between several customers orders using kitchen
stop clocks.
Asynchronous operations are called using async computation expressions (cf. Sec-
tion 13.4).
open System.Threading;;
let g() =
let thread1 = Thread (f 1)
let thread2 = Thread (f 2)
thread1.Start()
thread2.Start();;
val g : unit -> unit
g();;
Thread 2 gets mutex
Thread 2 releases mutex
Thread 1 gets mutex
Thread 1 releases mutex
Thread 2 gets mutex
Thread 2 releases mutex
Thread 1 gets mutex
Thread 1 releases mutex
Deadlocks
A deadlock may occur if two threads thread 1 and thread 2 are trying to acquire two mutex
objects mutex 1 and mutex 2 simultaneously as follows:
thread 1 : thread 2 :
acquire mutex 1 acquire mutex 2
acquire mutex 2 acquire mutex 1 deadlock
The deadlock occurs because thread 1 is waiting for mutex 2 that has been acquired and not
yet released by thread 2 while thread 2 is waiting for mutex 1 that has been acquired and
not yet released by thread 1 both threads are hence stuck and will never proceed.
A problem of this kind can be solved by stipulating a fixed order of nested acquirement
of mutex objects to be used throughout the program for instance always acquire mutex 1
before acquiring mutex 2 in the example. A thread acquiring mutex 1 would then always
find a free mutex object mutex 2 and can proceed acquiring also mutex 2 . The thread should
eventually release both mutex objects whereupon another thread may proceed acquiring
these objects (in the same order).
A program containing a potential deadlock will work most of the time and the deadlock
situation will only occur spuriously in special situations where two threads are doing the
reservations at exactly the same time. Extensive testing will most likely not reveal the prob-
lem and you may end up with an unreliable program that occasionally stops working in
stressed situations exactly when a correct function is most needed.
316 Asynchronous and parallel computations
async { asyncExpr }
Async.RunSynchronously:
Async<a> * ?int * ?CancellationToken -> a
Activates async.comp. possibly with time-out in mS and possibly with specified cancel-
lation token. Awaits completion.
Async.Start: Async<unit> * ?CancellationToken -> unit
Activates async.comp. possibly with specified canc. token. Does not await completion.
Async.StartChild: Async<T> * ?int -> Async<Async<T>>
Activates async.comp. and gets async.comp. to await result
Async.StartWithContinuations:
Async<T> * (T -> unit) * (exn -> unit)
*(OperationCanceledException -> unit) * ?CancellationToken
-> unit
Activates async.comp. with specified continuation and possibly specified cancellation
token. Does not await completion of computation.
Async.FromContinuations: ((T -> unit) * (exn -> unit)
* (OperationCanceledException -> unit) -> unit) -> Async<T>
Makes current asynchronous task a wait item. Argument function is called with triple
of trigger functions as the argument and should save one or more of these closures in
variables. The task continues as a work item when a trigger function is called.
Table 13.3 Selected functions to activate or deactivate asynchronous computations
This is just a value like any other but if started, it will run a task to download the HTML-
source of the DTU home page. This task will do the following:
318 Asynchronous and parallel computations
1. Create a WebClient object. This declaration need actually not be part of the async
expression and could hence be placed before async {. . .}
2. Initiate the download using AsyncDownloadString. This function makes the task an
wait item and returns this item in the form of an Async value comp. The asynchronous
computation comp will eventually run and terminate when the download has completed.
3. The termination of comp re-starts the rest of the computation with the identifier html
bound to the result of comp (which in this case is the result of the download).
4. The expression return html returns the value bound to html, that is, the result of the
download.
Please observe the following
The computation uses very few resources while waiting for the download it uses for
instance no thread during this time period.
The let! construct is required to make a binding to a value that is later returned at the
termination of an asynchronous computation.
The computation expression does in most cases contain a construct like return or
return! to give a result and will otherwise give the dummy value () as the re-
sult. Using return! yields a new asynchronous computation.
Computations downloading the HTML sources of the DTU and Microsoft home pages may
then be obtained as function values:
let downloadDTUcomp = downloadComp "http://www.dtu.dk";;
val downloadDTUcomp : Async<string>
and we may hence download the HTML-sources of the DTU and the Microsoft home pages
concurrently and compute their lengths:
13.4 Asynchronous computations 319
let paralDTUandMScomp =
downlArrayComp
[|"http://www.dtu.dk"; "http://www.microsoft.com"|];;
val paralDTUandMScomp : Async<string []>
The parallel download of HTML-sources can instead be made using the StartChild
function. This gives separated activation and waiting for completion of two child tasks:
let parallelChildrenDTUandMS =
async {let! compl1 = Async.StartChild downloadDTUcomp
let! compl2 = Async.StartChild downloadMScomp
let! html1 = compl1
let! html2 = compl2
return (html1,html2)};;
val parallelChildrenDTUandMS : Async<string * string>
start the downloads in two child tasks in parallel. The identifiers compl1 and compl2 are
bound to two asynchronous computations that when started will await the completion of the
child tasks. The main task is hence not blocked by the StartChild operations. It becomes
blocked when compl1 is started and awaits the completion of the corresponding child task:
The next let! construct: let! html2 = compl2 will in the same way afterwards await
the completion of the second child task.
The following example executes the above function downloadComp with continuations:
or it may be cancelled:
let tryListen(d) =
if cont.IsSome then invalidOp "multicast not allowed"
cont <- Some d
tryTrigger()
Consider for example a primitive dialogue program that finds lengths of HTML-sources
of web-pages. The program shows a window as in Figure 13.3. The upper text box is used
to enter the URL while the lower text box shows the answer from the program. The buttons
have the following functions:
Start url : Starts the download of the web-page using the URL in the upper text box.
Clear: Clears the text boxes.
Cancel: Cancels a progressing download.
The program we shall construct make use of the asynchronous event queue shown in
Table 13.4 and it has three parts:
13.5 Reactive programs 323
open System
open System.Net
open System.Threading
open System.Windows.Forms
open System.Drawing
// Initialization
window.Controls.Add urlBox
window.Controls.Add ansBox
window.Controls.Add startButton
window.Controls.Add clearButton
window.Controls.Add cancelButton
startButton.Click.Add (fun _ -> ev.Post (Start urlBox.Text))
clearButton.Click.Add (fun _ -> ev.Post Clear)
cancelButton.Click.Add (fun _ -> ev.Post Cancel)
// Start
Async.StartImmediate (ready())
window.Show()
Table 13.5 Dialogue program: The Window, Initialization and Start parts
The first part contains declarations corresponding to the window shown in Figure 13.3.
These declarations are shown in Table 13.5. In this part buttons and text boxes are de-
clared. Furthermore, a function disable is declared that controls enable/disable of the
buttons in the window. During the download of a web-page, for example, the user should
have the option to cancel the ongoing download; but the buttons for clearing the text fields
and for starting up a new download should be disabled in that situation.
324 Asynchronous and parallel computations
The second part (see comments in Table 13.5) contains the dialogue program. We shall
focus on this part in the following.
The third part connects the buttons of the user interface to events, shows the window and
starts the dialogue program. This part is shown in the lower part of Table 13.5.
Notice that the program is an event-driven program with asynchronous operations all run-
ning on a single thread. The complete program is found at the homepage for the book.
Dialogue automaton
We shall design an event-driven program that reacts to user events and status events from
asynchronous operations. The user events are described above. An asynchronous download
of a web-page can result in three kinds of status events:
ready loading
Start url
Clear
Web html
Clear Cancel
Error
Consider the automaton in Figure 13.4 with states: ready, loading, cancelling
and finished, where ready is the initial state it is marked with an incoming arrow,
and six events: Start url, Clear, Cancel, Web html, Cancelled and Error.
The runs starting in the initial state ready describe the allowed sequences of events. For
example, the sequence
is allowed because it brings the automaton from the ready state to the loading state.
Other allowed sequences are:
because the automaton gets stuck in the cancelling state. The two first events:
Start(url 1 ) Cancel
lead to the cancelling state and there are no outgoing transition labelled Clear from
that state.
Notice that the automaton conveys an overview of the interaction in the system. Furthermore,
the corresponding dialogue program will systematically be constructed from the dialogue
automaton in terms of four mutually recursive functions corresponding to the four states.
This leads to the program skeleton in Table 13.6.
The only parts that are missing in this program skeleton relate to the actions for the in-
coming events of the states. The other parts are systematically derived from the dialogue
automaton in Figure 13.4. The function implementing a given state of the automaton, for
example, the ready state, has three parts:
Part 1: Actions corresponding to the incoming event are performed in the first part. This
part is not described in the skeleton because these details are not present in the
automaton.
Part 2: Forbidden user input is disabled. By inspection of the events labelling the transitions
leaving a state, it can be observed which input the user should be able to provide
in that state. The ready state has no outgoing transition labelled Cancel and the
corresponding button is therefore disabled.
Part 3: Wait for incoming events and make a corresponding state transition. In the ready
state only Start and Clear events are allowed. A Clear event leads back to the
ready state while a Start event leads to the loading state.
The program skeleton in Table 13.6 contains a few details not present in Figure 13.4: Answer
strings are passed from the loading and cancelling states to the finished state.
326 Asynchronous and parallel computations
let ready() =
async { .... // actionReady: actions for incoming events
disable [cancelButton]
let! msg = ev.Receive()
match msg with
| Start url -> return! loading(url)
| Clear -> return! ready()
| _ -> failwith("ready: unexpected message")}
and loading(url) =
async { .... // actionLoading: actions for incoming events
disable [startButton; clearButton]
let! msg = ev.Receive()
match msg with
| Web html ->
let ans = "Length = " + String.Format("0:D",html.Length)
return! finished(ans)
| Error -> return! finished("Error")
| Cancel -> ts.Cancel()
return! cancelling()
| _ -> failwith("loading: unexpected message")}
and cancelling() =
async { .... // actionCancelling: actions for incoming events
disable [startButton; clearButton; cancelButton]
let! msg = ev.Receive()
match msg with
| Cancelled | Error
| Web _ -> return! finished("Cancelled")
| _ -> failwith("cancelling: unexpected message")}
and finished(s) =
async { .... // actionFinished: actions for incoming events
disable [startButton; cancelButton]
let! msg = ev.Receive()
match msg with
| Clear -> return! ready()
| _ -> failwith("finished: unexpected message")}
It is now straightforward to complete the whole dialogue program. The type for events (or
messages) and an event queue are declared as follows:
let ev = AsyncEventQueue();;
and the action parts missing in the skeleton program are declared as follows:
13.5 Reactive programs 327
Actions for incoming event in the ready state: The two text boxes must be cleared:
Actions for incoming event in the loading state: The text box for the answer is set and an
asynchronous download of a web-page is started with continuations as we have seen on
Page 320:
Async.StartWithContinuations
(async { let webCl = new WebClient()
let! html = webCl.AsyncDownloadString(Uri url)
return html },
(fun html -> ev.Post (Web html)),
(fun _ -> ev.Post Error),
(fun _ -> ev.Post Cancelled),
ts.Token)
Actions for incoming event in the cancelling state: The answer text box is set.
Actions for incoming event in the finished state: The answer text box is set.
ansBox.Text <- s
isPrime 232012709;;
val it : bool = true
A test of a small number is fast whereas a test of a large prime number like the one in the
above example takes some observable amount of time.
We shall use randomly generated integers in our experiments. They are generated by the
following function gen, where gen range , with range > 0, generates a number that is
greater than or equal to 0 and smaller than range :
let gen = let generator = new System.Random()
generator.Next;;
val gen : int -> int
gen 100;;
val it : int = 24
13.6 Parallel computations 329
gen 100;;
val it : int = 53
The experiments in the rest of this section are conducted on a 4-core 2.67 GHz Intel I7
CPU with 8GB shared memory.
Data parallelism
The map function on collections is the canonical example for exploiting data parallelism,
where a function is applied in parallel to the members of a collection. Parallel implemen-
tations of functions on arrays are found in the Array.Parallel library as shown in
Table 13.7.
We have studied these functions previously in the book, so we just illustrate the advantage
of using the function parallel version of the map function on an array with 5000000 numbers:
let bigArray = Array.init 5000000 (fun _ -> gen 10000);;
val bigArray : int [] = [|2436; 7975; 2647; 1590; 5959; 3951;
430; 1705; 2527; 1004; 2333; ... |]
Mapping the isPrime function on the elements of bigArray will generate a new Boolean
array, where an entry is true if and only if the corresponding entry in bigArray is a prime
number:
#time;;
The experiment shows a speed-up of approximately 2 in real time when using the parallel
version of map. The main point is that achieving this speed-up is effortless for the program-
mer. Note that the total CPU time (20.218 seconds) used on all cores is approximately double
the time needed for a non-parallel version.
In order to use the library for parallel operations on sequences you need to install the F#
Power Pack. The PSeq library in that package provides parallel versions of a rich collection
of the functions in the Seq library (see Chapter 11). These functions can also be used on lists
and arrays as we have seen in Section 11.7. We just show one experiment with the exists
function from the PSeq library:
#r @"FSHarp.PowerPack.Parallel.Seq"
open Microsoft.FSharp.Collections
In the example we search for the existence of a prime number that do not exists in the
generated sequence in order to be sure that the whole sequence is traversed. The speed-up is
about 2 and the figures are similar to those for the above experiment using map.
Task parallelism
The problem-solving strategy we have used throughout the book is to solve a complex prob-
lem by combining solutions to simpler problems. This strategy, which also is known as
divide and conquer, fits very well with task parallelism, where a complex problem is solved
by combining solutions to simpler problems that can be solved in parallel.
We illustrate the idea on a simple example. Consider the type for binary trees given in
Section 6.4:
A function to test for the existence of an element in a binary tree satisfying a given predicate
is declared as follows:
13.6 Parallel computations 331
This parallel version does, however, not give any significant performance gain:
The problem with this version is that a huge amount of tasks are created and the administra-
tion of these tasks cancels out the advantage with multiple core.
This problem is handled by the introduction of a maximal depth to which new tasks are
created:
The speedup is approximately 2.3. At depths starting from about 22 the degradation of per-
formance grows fast. This is not surprising taking the number of subtrees at such depths into
account.
is sorted by first rearranging the elements v1 ...vn2 vn1 such that the resulting elements
v1 ...vn2
vn1 can be partitioned into two sections with indices 1, . . . , k and k + 1, . . . , n
1, respectively, such that all the elements in first section are smaller than v0 and all the
13.6 Parallel computations 333
Indices : 0 1 k1 k k+1 n2 n1
Values : v0 v1 vk1 vk vk+1 vn2 vn1
All elements < v0 All elements v0
The element v0 can now be correctly placed in its final position by swapping it with the k s
element:
All elements < v0 All elements v0
Indices : 0 1 k1 k k+1 n2 n1
Values : vk v1
vk1
v0 vk+1
vn2
vn1
To be sorted To be sorted
This array has the property that any element in the first section is smaller than any element
in the second section, as the elements in the first section are < v0 while the elements in
the second section are v0 . The array can hence be sorted by sorting each of the sections
separately. This algorithm will have an average run time that is proportional to n log n and
a worst-case run time proportional to n2 , where n is the length of the array.
The sorting algorithms available in the libraries have a better worst-case run time (pro-
portional to n log n) and they are using very efficient algorithms. So our recommendation
is to use these libraries. We just use the Quick sort algorithm here to illustrate that the above
method for parallelizing a divide and conquer algorithm applies to a non-trivial algorithm.
Indices : k1 k1 + 1 k2
Values : v k1 vk1 +1 v k2
so that the elements in the section which are smaller than a give value v comes before the
elements which are greater than or equal to v :
Indices : k1 k1 + 1 K K +1 k2
Values : vk 1 vk 1 +1
vK
vK+1 vk 2
All elements < v All elements v
The value of the expression partition a v k1 k2 is K , that is, the index of the last element
in the first section containing elements smaller than v :
334 Asynchronous and parallel computations
So far we have just achieved an imperative program that can sort an array:
let a1 = [|1; -4; 0; 7; 2; 3|];;
val a1 : int [] = [|1; -4; 0; 7; 2; 3|]
sort a1;;
val it : unit = ()
a1;;
val it : int [] = [|-4; 0; 1; 2; 3; 7|]
Even though Quick sort is an imperative algorithm that changes an array, this does not cause
any problems for a parallel version since the two recursive calls of qsort work on non-
overlapping sections of the array these two recursive call are independent of each other.
Therefore, a parallel version that creates tasks up to a certain depth only is straightforwardly
achieved using the same technique as used for the parallel search in a binary tree:
let rec pqsort a i j depth =
if j-i<= 1 then ()
else if depth=0 then qsort a i j
else let k = partition a a.[i] (i+1) (j-1)
swap a i k
let s1 = Task.Factory.StartNew
(fun () -> pqsort a i k (depth-1))
let s2 = Task.Factory.StartNew
(fun () -> pqsort a (k+1) j (depth-1))
Task.WaitAll[|s1;s2|];;
val pqsort : a [] -> int -> int -> int -> unit
when a : comparison
Summary 335
Since pqsort is an imperative algorithm we need to wait for the termination of both of the
tasks s1 and s2 for the recursive calls. The function
is used for that purpose. It waits until all the provided tasks have completed their executions.
Experiments show a speed-up of approximately 1.7 when sorting an array of size 3200000:
sort a32;;
Real: 00:00:14.090, CPU: 00:00:14.024,
GC gen0: 1009, gen1: 3, gen2: 0
val it : unit = ()
It is not surprising that parSort gets a smaller speed-up than parExistsDepth. The
recursive call of parExistsDepth requires only that the disjunction || is computed
on the values of the recursive calls, and this sequential part is a very fast constant-time
operation. On the other hand, prior to the two recursive and parallel calls of parSort, a
partitioning has to be made of the section to be sorted, and this sequential component has a
run time that is linear in the size (j i) of the section.
Summary
In this chapter we have introduced
The common challenges and pitfalls in parallel programming are described. The async
computation expression is introduced and it is shown how asynchronous computations can
be used to make reactive, asynchronous programs with a very low resource demand. Li-
brary functions for parallel programming are introduced and it is show how they are used in
achieving computations executing concurrently on several cores.
336 Asynchronous and parallel computations
Exercises
13.1 Make program producing the deadlocked situation described on Page 315.
13.2 Make a type extension (cf. Section 7.4) of the class AsyncEventQueue<T> with an extra
member Timer: T -> int -> unit such that evaluating
evnq.Timer evnt n
will start an asynchronous computation that first sleeps n milliseconds and afterwards sends the
event evnt to the queue evnq.
Hint: Apply Async.StartWithContinuations to Async.Sleep with suitable con-
tinuations.
13.3 Consider the dialogue program in Table C.1. Sometimes it is more convenient to let the func-
tions for the state of the automaton communicate using shared variables rather than using func-
tion parameters. Revise the program so that loading and finished become parameterless
functions. Is this revision an improvement?
13.4 Make a quiz program where a user should guess a number by asking the following questions:
Is the number < n?
Is the number = n?
Is the number > n?
where n is a integer. The program can give the following answers:
Yes
No
You guessed it!
The program must fix a random number between 0 and 59 to be guessed before starting the
dialogue, and each run of the program should give a new number to be guessed.
13.5 Make a geography program guessing a country in Europe. The program asks questions to the
user who answers yes or no. The program should use a binary tree with country names in the
leaves and with a question in each node, such that the left subtree is chosen in case of answer
yes and the right in case of answer no.
The program can be made to look more intelligent by inserting some random questions in
between the systematic questions taken from the tree. The random questions should be of two
kinds: Silly questions where the answer is not used by the program, and direct questions guess-
ing a specific country where the answer is used by the program in case it gets answer yes.
13.6 The game of Nim is played as follows. Any number of matches are arranged in heaps, the
number of heaps, and the number of matches in each heap, being arbitrary. There are two players
A and B . The first player A takes any number of matches from a heap; he may take one only,
or any number up to the whole of the heap, but he must touch one heap only. B then makes a
move conditioned similarly, and the players continue to take alternately. The player who takes
the last match wins the game.
The game has a precise mathematical theory: We define an operator xorb for non-negative
integers by forming the exclusive or of each binary digit in the binary representation of the
numbers, for example
109 = 11011012
70 = 10001102
109 xorb 70 = 01010112 = 43
val it : int = 43
The operator xorb is associative and commutative, and 0 is the unit element for the operator.
Let the non-negative integers a1 , . . . , an be the number of matches in the n heaps, and let m
denote the integer:
m = a1 xorb a2 xorb xorb an
This appendix provides complete programs of the keyword program from the Chapter 10. It
consists of
a section introducing the basic HTML concepts,
a section containing the complete IndexGen program, and
a section containing the complete NextLevelRefs program.
The remaining program for the keyword example: MakeWebCat, appears in Table 10.19.
The source can also be found on the homepage of the book.
<a href="http://msdn.microsoft.com/en-us/library/ee353439.aspx">
active pattern</a>
The text:
active pattern
is displayed in the button and a click will cause the browser to select the web-page given by
the URI:
http://msdn.microsoft.com/en-us/library/ee353439.aspx
A link is hence defined by a pair of elements: a start element <a. . . > and an end element
</a> surrounding the text to be displayed. The construction:
href=" . . . "
defines the href attribute of the element <a. . . >. Attributes have special uses and are not
displayed text.
339
340 Programs from the keyword example
Elements in HTML appear in pairs of start and end elements, and some elements may
contain attributes. The line break element <br /> is considered a (degenerated) pair of
start and end element <br></br>.
Text to be displayed is encoded in the HTML encoding with certain characters encoded
using HTML escape sequences like < and & encoding < and &. The internet
browser performs the corresponding decoding when displaying a text.
The HTML notation has developed over time and web-pages around the world follow
different standards. The standard is now controlled by the World Wide Web Consortium
(W3C) and more recent standards define HTML as a specialization of the XML notation.
The HTML-source of the library keyword index page starts with the Document type defi-
nition that is an XML <!DOCTYPE. . . > element:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
The start of the HTML part is signalled by:
<html>
The heading starts with the title to be displayed on the boundary of the browser window:
<head>
<title>F# Program Library Documentation Keyword Index</title>
It is followed by the style section:
<style type = "text/css">
h1 {color: purple; font-size: x-large; font-family: Verdana}
p {font-family: Verdana; font-size: large; color: maroon}
a {font-family: Verdana; text-decoration: none;
font-size: medium}
</style>
</head>
This section defines the appearance of different parts of the web-page:
h1: Level 1 heading in purple with x-large Verdana font.
p: Paragraphs in large Verdana font in maroon colour
a: Links in medium-sized Verdana font without the default underlining.
The reader may consult a Cascading Style Sheet (CSS) manual for further information about
styles in HTML.
The body starts with a level 1 heading <h1>. . . </h1> and a paragraph <p>. . . </p>:
<body>
<h1>F# Program Library Documentation Keyword Index</h1>
<p>Version date: Saturday, August 27, 2011</p>
Each link is followed by a line break <br />:
<a href="http://msdn.microsoft.com/en-us/library/ee353439.aspx">
active pattern</a><br />
A.1 Web source files 341
open System;;
open System.IO;;
open System.Globalization;;
open System.Text.RegularExpressions;;
open Microsoft.FSharp.Collections;;
open System.Web;;
open TextProcessing;;
// Input part
let keyWdIn() =
let webCat = restoreValue "webCat.bin"
let handleLine (keywSet: Set<orderString*string>) str =
match getData str with
| Comment -> keywSet
| SyntError str -> failwith ("SyntaxError: " + str)
| KeywData (_,[]) -> keywSet
| KeywData (title,keywL) ->
let uri = Map.find title webCat
let addKeywd kws kw = Set.add (enString kw, uri) kws
List.fold addKeywd keywSet keywL
let keyWdSet = Set.empty<orderString*string>
fileFold handleLine keyWdSet "keywords.txt";;
// Output part
let preamble =
"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"
\"http://www.w3.org/TR/html4/strict.dtd\">
<html>
<head>
<title>F# Program Library Documentation Keyword Index</title>
<style type = \"text/css\">
h1 color: purple; font-size: x-large; font-family: Verdana
p font-family: Verdana; font-size: large; color: maroon
a font-family: Verdana; text-decoration: none;
font-size: medium
</style>
</head>
<body>
<h1>F# Program Library Documentation Keyword Index</h1>
<p>Version date: "
+ (String.Format(CultureInfo "en-US","0:D",DateTime.Now))
+ "</p>" ;;
let webOut(keyWdSet) =
use webPage = File.CreateText "index.html"
let outAct oldChar (orderKwd: orderString,uri: string) =
let keyword = string orderKwd
let newChar = keyword.[0]
if (Char.ToLower newChar <> Char.ToLower oldChar
&& Char.IsLetter newChar)
then webPage.WriteLine "<br />"
else ()
webPage.Write "<a href=\""
webPage.Write uri
webPage.WriteLine "\">"
webPage.Write (HttpUtility.HtmlEncode keyword)
webPage.WriteLine "</a><br />"
newChar
webPage.WriteLine preamble
Set.fold outAct a keyWdSet |> ignore
webPage.Close()
[<EntryPoint>]
let main (param: string[]) =
let keyWdSet = keyWdIn()
webOut keyWdSet
0;;
open System ;;
open System.IO ;;
open System.Net ;;
open System.Collections.Generic ;;
open System.Text.RegularExpressions ;;
open System.Web ;;
open System.Xml ;;
open TextProcessing ;;
open System ;;
open System.IO ;;
[<EntryPoint>]
let main (args: string[]) =
if Array.length args < 2 then
failwith "Missing parameters"
else
if File.Exists args.[1] then
failwith "Existing output file"
else
use output = File.CreateText args.[1]
fileXiter (handleLinePair output) args.[0]
output.Close()
0 ;;
This appendix contains the source code of the TextProcessing library that was intro-
duced in Chapter 10. It consists of a signature file TextProcessing.fsi and an imple-
mentation file TextProcessing.fs. This library is organized into four groups:
A group on regular expressions. This group is documented on Page 224. See Table 10.4.
A group on file functions. This group is documented on Page 230. See Table 10.6.
A group on file handling. This group is documented on Page 230. See Table 10.8.
A group on culture-dependent string ordering. This group is documented in Section 10.6.
See Table 10.9.
The interface file TextProcessing.fsi is given in Table B.1. The listing of the im-
plementation file TextProcessing.fs is split into four tables: Table B.2 B.5, one for
each of the above-mentioned groups. The source can also be found on the homepage of the
book.
module TextProcessing
// Regular expressions
open System.Text.RegularExpressions
// File functions
open System.IO
// File handling
open System.IO
346
The TextProcessing library 347
open System
exception StringOrderingMismatch
[<Sealed>]
type orderString =
interface IComparable
module TextProcessing
// Regular expressions
open System.Text.RegularExpressions
// File functions
open System
open System.IO
let fileFold f e s =
fileXfold (fun e s -> f e (s.ReadLine())) e s
let fileIter g s =
fileXiter (fun s -> g (s.ReadLine())) s
// File handling
open System.IO
open System.Runtime.Serialization.Formatters.Binary
open System.Globalization
open System
exception StringOrderingMismatch
[<CustomEquality;CustomComparison>]
type orderString =
{Str: string; Cult: string; Cmp: string->string->int}
override s.ToString() = s.Str
interface System.IComparable with
member s1.CompareTo sobj =
match sobj with
| :? orderString as s2 ->
if s1.Cult <> s2.Cult then raise StringOrderingMismatch
else
match s1.Cmp s1.Str s2.Str with
| 0 -> compare s1.Str s2.Str
| z -> z
| _ ->
invalidArg "sobj"
"cannot compare values with different types"
override s1.Equals sobj =
match sobj with
| :? orderString as s2 -> s1 = s2
| _ -> false
override s.GetHashCode() = hash(s.Str)
This appendix contains the complete program for the skeleton program shown in Table 13.6.
The reader should consult Section13.5 for further information.
let ev = AsyncEventQueue()
disable [cancelButton]
let! msg = ev.Receive()
match msg with
| Start url -> return! loading(url)
| Clear -> return! ready()
| _ -> failwith("ready: unexpected message")}
and loading(url) =
async {ansBox.Text <- "Downloading"
use ts = new CancellationTokenSource()
Async.StartWithContinuations
(async {let webCl = new WebClient()
let! html = webCl.AsyncDownloadString(Uri url)
return html},
(fun html -> ev.Post (Web html)),
(fun _ -> ev.Post Error),
(fun _ -> ev.Post Cancelled),
ts.Token)
350
The dialogue program from Chapter 13 351
and cancelling() =
async
{ansBox.Text <- "Cancelling"
and finished(s) =
async {ansBox.Text <- s
[1] Harold Abelson, Gerald Jay Sussman, Structure and Interpretation of Computer Programs, second
edition, The MIT Press, Cambridge, MA, USA, 1996.
[2] Alfred Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, Compilers: Principles, Techniques and
Tools, second edition, Pearson Addison-Wesley, Boston, MA, USA, 2006.
[3] Guy Cousineau, Michel Mauny, The Functional Approach to Programming, Cambridge University
Press, Cambridge, United Kingdom, 1998.
[4] Maarten M. Fokkinga, Functioneel programmeren in een vogelvlucht, INFORMATIE, vol. 27, pp. 862
873, Kluwer b.v., Deventer, The Netherlands, 1985.
[5] Michael R. Hansen, Hans Rischel, Introduction to Programming using SML, Addison-Wesley Long-
man, Harlow, England, 1999.
[6] Peter Henderson, Functional Geometry, Proceedings of the 1982 ACM Symposium on LISP and Func-
tional Programming, pp. 179187, ACM, Pittsburgh, PA, USA, 1982.
[7] Graham Hutton, Erik Meijer, Monadic Parsing in Haskell, Journal of Functional Programming, vol. 8,
pp. 437444, Cambridge University Press, Cambridge, United Kingdom, 1998.
[8] Robin Milner, Mads Tofte, Robert Harper, David MacQueen, The Definition of Standard ML, revised
edition, The MIT Press, Cambridge, MA, USA, 1997.
[9] Microsoft Development Network MSDN, on the internet.
[10] L.C. Paulson, ML for the Working Programmer, second edition, Cambridge University Press, Cam-
bridge, United Kingdom, 1996.
[11] Peter Sestoft, Henrik I. Hansen, C# Precisely, second edition, The MIT Press, Cambridge, MA, USA,
2012.
[12] Peter Sestoft, Programming Language Concepts for Software Developers, Springer, London, England,
2012.
[13] Don Syme, Adam Granicz, Antonio Cisternino, Expert F# 2.0, Apress, New York, NY, USA, 2010.
[14] Simon Thompson, Haskell. The Craft of Functional programming, third edition, Addison-Wesley
Longman, Harlow, England, 2011.
[15] Philip Wadler, Monads for functional programming, Advanced Functional Programming, Proceedings
of the Bastad Spring School, May 1995, Lecture Notes in Computer Science 925, Springer, Berlin,
Heidelberg, Germany, 1995.
The URL of [9] is found on the home page of the book (see Page x).
353
Index
355
356 Index