0% found this document useful (0 votes)
12 views472 pages

Introduction to Functional Programming

Uploaded by

Asif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views472 pages

Introduction to Functional Programming

Uploaded by

Asif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Fundamentals of Functional Programming

Lecture 1

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Syllabus I

λ-calculus
■ type-free λ-calculus as an abstract programming language
(confluence, reduction strategies, datatype representation);
■ relation to computability (Turing-completeness, infinitary
datatypes);
■ simply typed λ-calculi (Church-style, Curry-style);
■ polymorphic λ-calculi (the λ-cube, logical interpretation);
■ proofs as programs.

2 of 472
Syllabus II

Category theory
■ categories (abstract and concrete categories, product categories,
subcategories);
■ basic constructions (morphisms, limits and colimits, exponentials,
subobject classifiers, power objects);
■ functors and natural transformations (definitions and examples);
■ adjunctions (definition and examples);
■ models of the λ-calculi (Lambek’s constructions).

3 of 472
Texts I

λ-calculus:
■ [HS] J.R. Hindley and J.P. Seldin, Lambda-calculus and
Combinators: an Introduction, Cambridge University Press (2008).
ISBN: 978-0521898850
The lectures are based on this book
■ [Bar] [Link] The Lambda Calculus: Its Syntax and
Semantics, 2nd edition, North-Holland Elsevier (1984).
ISBN: 978-0444875082
■ [Bar2] H.P. Barendregt, Lambda Calculus with Types, Chapter 2 of
S. Abramsky, D.M. Gabbay, T.S.E. Maibaum (editors), Handbook
of Logic in Computer Science, volume 2, Clarendon Press (1992).
ISBN: 978-0198537618

4 of 472
Texts II

λ-calculus:
■ [Pierce] B.C. Pierce, Types and Programming Languages, MIT
Press (2002).
ISBN: 978-0262162098
■ [Sel] Peter Selinger, Lecture Notes on the Lambda Calculus,
[Link]

5 of 472
Texts III
Category theory:
■ [Pierce2] B.C. Pierce, Basic Category Theory for Computer
Scientists, MIT Press (1991).
ISBN: 978-0262660716
The lectures are based on this book
■ [MacLane] S. Mac Lane, Categories for the Working
Mathematician, 2nd edition, Springer-Verlag (1998).
ISBN: 978-0387984032
■ [Goldblatt] R. Goldblatt, Topoi: The categorical analysis of logic,
Elsevier (1984).
ISBN: 978-0444867117
This book has been republished at a nicer price by Dover (2009).
ISBN: 978-0486450261.
6 of 472
Texts IV

Category theory:
■ [Lambek] L. Lambek and P.J. Scott, Introduction to Higher Order
Categorical Logic, Cambridge University Press (1988).
ISBN: 978-0521356534
■ [Cats] J. Adamek, H. Herrlich and G. Strecker, Abstract and
concrete categories: The joy of cats,
[Link]
■ [Rydeheard] D.E. Rydeheard, R.M. Burnstall, Computational
Category Theory, Prentice Hall (1988).
ISBN: 978-0131627369.
[Link]

7 of 472
Texts V

The ML programming language:


■ [Paulson] L.C. Paulson, ML for the working programmer, 2nd
edition, Cambridge University Press (1996).
ISBN: 978-0521565431
The lectures are based on this book
■ [Harper] R. Harper, Programming in Standard ML, Carnegie Mellon
University (2009).
[Link]

There are no mandatory textbooks: the material we cover is standard,


so any reasonable book on the topics of interest will do.

8 of 472
Examination

The examination will be oral. It covers the whole program, and the
student will be asked to answer to simple exercises, to prove results as
seen in lessons, and to show understanding of the basic concepts of
the course.
Examinations will take place six times per year, at prefixed dates.
Whoever wants to undertake the examination, must subscribe for the
date.
Students are required to bring their study material (books, handouts,
etc.) to the examination.

9 of 472
Introduction

This course wants to introduce the Mathematics of functional


programming.
Essentially, functional programming is a question of style.
A functional programmer thinks to her/his code as a formal,
mathematical object which computes the desired result. S/he does so
by avoiding useless devices as variables, assignments, . . . In a word,
functional means without state.
Also, functional programs tend to be very short, very compact and
very flexible. They are also harder to understand and to write since
they require a deep understanding of the problem.
This course explains how to think in a functional style and, thus, how
to write functional programs.

10 of 472
An example: Quicksort I

The idea of quicksort is to sort an array by choosing an element,


called pivot, and sorting the sub-arrays of the elements strictly less
and strictly greater than the pivot. Then, the sorted array is given by
the less-than sorted sub-array, the pivot and the greater-than
sub-array. A recursive application of this rule calculates a sorted array
for any given array.

Usually, an iterative version of this algorithm is used, since “recursion


is bad for efficiency”.

11 of 472
An example: Quicksort II

A precise statement of the algorithm immediately reveals that we have


to state the trivial cases of an empty array, and of an array containing
just one element. We stipulate that arrays of these forms are sorted.
Then, the algorithm qs taking A as input can be easily described as:
■ if A = [], i.e., A is empty, then qs(A) = A;
■ if A = [x], i.e., A contains just one element x, then qs(A) = A;
■ if A = a :: B, i.e., A contains at least one element a followed by
another array B, then the result is qs(A) = qs(A1 ) @ [a] @ qs(A2 ),
where A1 is the array containing the elements of B less than a and
A2 is the array containing the elements of B greater than a.

12 of 472
An example: Quicksort III

A direct encoding of this description in ML, gives:


fun qs [] = []
| qs [x] = [x]
| qs (a :: B) =
let fun partition(l , r , []) = (qs l ) @ [a] @ (qs r )
| partition(l , r , x :: Xs) =
if x <= a then partition(x :: l , r , Xs)
else partition(l , x :: r , Xs)
in partition([], [], B) end;

13 of 472
An example: Quicksort IV

As you can see, the code is compact. There are no extra variables
which are not needed in the definition.

A simple test reveals that the code is fast, too. This is due to the fact
that partition is tail-recursive.

A simple test reveals that the code uses more space than the usual
iterative version in, let say, C language. This is not a problem on
modern computer with Gigabytes of RAM; in any case, a space
efficient version can be developed.

The major features toward a clean representation are recursion and


pattern matching in the definition.

14 of 472
Another example: Summation I

We want to write a function to calculate the sum of the elements in a


given list.
A simple solution would be:
fun sum [] =0
| sum(x :: xs) = x + (sum xs);

This solution is correct, but unnecessarily complex.

15 of 472
Another example: Summation II

We can introduce an abstract construction,


fun foldr f z [] =z
| foldr f z (x : xs) = f x (foldr f z xs);

This higher-order functional expands to

foldr f z [1, 2, 3, 4, 5] = f 1 (f 2 (f 3 (f 4 (f 5 z)))) .

In this way, summation can be defined simply as


val sum = foldr (op +) 0;

16 of 472
Another example: Summation III

Programming by means of an extensive use of higher-order functionals,


allows to simplify the way to encode functions. Higher-order
functionals encode the “computational pattern” we need, and so we
confine the complexity of a program into a few abstract pieces.

In fact, in the sum example, we defined the iterative structure of the


sum into foldr, a higher-order functional, which is then used to define
our target function. If a computational pattern is frequent, it is very
convenient to encode it as a functional.

We want to remark that “thinking by functionals” is easier (when used


to), allowing to concentrate on smaller and well-confined problems.

17 of 472
References and Hints

In this section, students will find references for the material which has
been explained. Usually, they are textbooks or research articles where
the lesson has been taken from.

Although it is not compulsory to study these references, they can


provide background information or a deeper insight on what has been
explained.

18 of 472
Fundamentals of Functional Programming
Lecture 2

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

This is the first real lesson in this course: our aim is to introduce the
λ-calculus in its untyped version.

As a side aspect, we want to introduce also a presentation style, which


is predominant in mathematical expositions, to allow students to get
used to it.

For these reasons, this lesson will be very short.

20 of 472
A programming introduction to λ-calculus

The λ-calculus has been invented by Alonzo Church around 1930, as a


mathematical tool to describe and to study the properties of
computable functions.
The λ-calculus is extremely simple, with a bare-bone syntax allowing
for simple and compact proofs.
There are different versions of the λ-calculus:
■ pure λ-calculus is the easiest and more powerful system;
■ typed λ-calculi are variations over the pure version, where terms
have types in some algebra.
Every functional programming language is a (typed) λ-calculus with
some additional constructs, for performance and clarity.
We will start presenting the pure λ-calculus.

21 of 472
Syntax I

Assume to have an infinite (denumerable) set of variables V .


Definition 2.1 (λ-term)
A λ-term is inductively defined as follows, along with FV, the set if its
free variables:
■ if x ∈ V then x is a λ-term and FV(x) = { x };
■ if M and N are λ-terms, then (M · N) is a λ-term and
FV(M · N) = FV(M) ∪ FV(N); these λ-terms are called applications
and, usually, the · operation is not written;
■ if x ∈ V and M is a λ-term, then (λx . M) is a λ-term and
FV(λx . M) = FV(M) \ { x }; these λ-terms are called abstractions.

We should think to λ-terms as our programs and data.

22 of 472
Syntax II
To simplify notation, capital letters (M, N, P, . . . ) will denote
arbitrary λ-terms; also, x, y , z, u, v , w will denote variables.
Parentheses are suppressed according to the following rules:
■ we always omit outermost parentheses;
■ we write (λx . P Q) for (λx . (P Q));
■ we write (M1 M2 · · · Mn ) for ((. . . (M1 · M2 ) . . .) · Mn ), i.e., application
associates to the left;
■ we write (λx1 , x2 , . . . , xn . M) for (λx1 . (λx2 . (. . . (λxn . M) . . . )))
Finally, we write M ≡ N to indicate that M and N are syntactically
identical, i.e., they are equal as strings.
Evidently, M N ≡ P Q implies M ≡ P and N ≡ Q, and λx . M ≡ λy . N
implies x ≡ y and M ≡ N. The vice versa holds, too.

23 of 472
Syntax III
Most proofs and definitions are given by induction on the structure of
λ-terms. Here are some useful examples.
Definition 2.2 (Occurrence)
For λ-terms P and Q, we say that P occurs in Q iff one of the
following cases applies
■ P ≡Q (*);
■ Q ≡ M N and P occurs in M or P occurs in N;
■ Q ≡ λx . M and P occurs in M.
An occurrence of P in Q is whenever the clause (*) applies.

Definition 2.3 (Scope)


We say that the occurrence of M in λx . M is the scope of λx . M in P
if λx . M occurs in P. We also say that x is bounded in that scope.
24 of 472
Syntax IV
Definition 2.4 (Substitution)
For any M, N λ-terms and x variable, M[N/x] denotes the λ-term
where every unbounded occurrence of x in M is substituted with N.
Explicitly
■ x[N/x] ≡ N;
■ y [N/x] ≡ y where x 6≡ y ;
■ (P Q)[N/x] ≡ (P[N/x])(Q[N/x]);
■ (λx . P)[N/x] ≡ λx . P;
■ (λy . P)[N/x] ≡ λy . P if x 6≡ y and x 6∈ FV(P);
■ (λy . P)[N/x] ≡ λy . P[N/x] if x 6≡ y , x ∈ FV(P) and y 6∈ FV(N);
■ (λy . P)[N/x] ≡ λz . (P[z/y ])[N/x] if x 6≡ y , x ∈ FV(P), y ∈ FV(N)
and z 6∈ FV(N P).
25 of 472
Syntax V

We stipulate that programs do not depend on the names of their


bounded variables. This convention is captured via the following
definition.
Definition 2.5 (α-conversion)
We say that P α-converts to Q, notation P ≡α Q, if P and Q are
identical except for a renaming of bounded variables. In formal terms,
P ≡α Q iff:
■ P ≡ Q;
■ P ≡ M N, Q ≡ M 0 N 0 and M ≡α M 0 , N ≡α N 0 ;
■ P ≡ λx . M, Q ≡ λy . N and M[z/x] ≡α N[z/y ], where z 6∈ FV(M N).

26 of 472
Syntax VI

Lemma 2.6
1. If P ≡α Q then FV(P) = FV(Q);
2. The relation ≡α is an equivalence, i.e.,
ä P ≡α P;
ä P ≡α Q implies Q ≡α P;
ä P ≡α Q and Q ≡α R implies P ≡α R.

Proof.
The first statement is by induction on the definition of α-conversion.
The second statement is easy, except for symmetry, which is treated
by induction on the definition of α-conversion.
[Exercise] Complete the details.

27 of 472
Operational Semantics I

Definition 2.7 (β-reduction)


Any term of the form (λx . M)N is called a β-redex and the term
M[N/x] is called its contractum.
A term P β-reduces to Q in one step, notation P B1β Q, iff P contains
an occurrence of a β-redex (λx . M)N, while Q, contains in the same
place, its contractum M[N/x].
In formal terms, if P ≡ P 0 [(λx . M)N/z] and z occurs only once in P 0 ,
then Q ≡ P 0 [(M[N/x])/z].
We say that P β-reduces to Q, notation P Bβ Q, iff there exists a
finite sequence of terms R1 , . . . , Rn such that R1 ≡ P, Rn ≡ Q and, for
every i in 1 . . . (n − 1), either Ri B1β Ri +1 or Ri ≡α Ri +1 .

28 of 472
Operational Semantics II

Definition 2.8 (β-normal form)


A term P which contains no β-redex is said to be in β-normal form or
β-nf for short; a term Q in β-nf is called a β-normal form for P if
P Bβ Q.

The idea behind β-nfs is that a program computes by β-reduction. A


program terminates when it has nothing more to compute, i.e., when
it is a β-nf.
A program P on input M generates the λ-term P M, which computes
by β-reduction until it produces a β-nf R which is its output.

29 of 472
Operational Semantics III

There are terms without a normal form: for example


Ω ≡ (λx . x x)(λx . x x) cannot reduce to anything but itself.

The sequence of reduction matters: for example, calling K ≡ (λx , y . x),


the term K K Ω has K as β-nf, but there is a non-terminating
sequence of reductions which insists on expanding the useless Ω.

30 of 472
References and Hints

This lesson is based on Chapter 1 of [HS]. Some definitions and


notations are slightly different, although equivalent.

Remember that students are required to compute β-reductions


without efforts, and thus you are invited to do exercises on this
purpose, which can be found on any textbook.

Also, try to understand why definitions have been formalised as shown,


and what are the limit cases.

Finally, do and redo the proofs until their logic is clear to you. They
are required to pass the examination and, most of all, what is
evaluated is your understanding rather than your memory.

31 of 472
Fundamentals of Functional Programming
Lecture 3

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

We introduced so far the syntax of pure λ-calculus. Also, we


stipulated that programs are identified modulo a renaming of local
variables (α-conversion). Finally, we have shown how a λ-term
computes by means of β-reduction.

In this lecture, we want to prove that β-reduction acts as a


“well-behaved” computational paradigm.

Moreover, we want to define a notion of equality that identifies terms


which will produce the same “output”.

33 of 472
Confluence I
β-reduction behaves naturally with respect to substitution.
Lemma 3.1
1. If P Bβ Q then FV(Q) ⊆ FV(P);
2. If P Bβ P 0 and Q Bβ Q 0 then P[Q/x] Bβ P 0 [Q 0 /x].
Proof. (i)
First, we prove that, if M B1β N, then FV(N) ⊆ FV(M). So, let
M ≡ M 00 [(λx . M 0 ) N 0 /z] and N ≡ M 00 [(M 0 [N 0 /x])/z]. Thus,
FV(N) ⊆ (FV(M 00 ) \ { z }) ∪ (FV(M 0 ) \ { x }) ∪ FV(N 0 ) = FV(M).
Containment can be strict, e.g., consider M ≡ (λx . u) v and N ≡ u.
Since P ≡ P1 B1β P2 ≡α P20 B1β · · · B1β Pn ≡α Q, by induction on n, the
statement (1) holds because α-conversions do not change the set of
free variables. ,→
34 of 472
Confluence II

,→ Proof. (ii)
As for (2), by induction on the length of the reduction P Bβ P 0 , it
suffices to prove the cases P ≡ P 0 and P B1β P 0 .
The former case is evident, since the reduction Q Bβ Q 0 can be
replicated in the context P[Q/x] yielding P 0 [Q 0 /x].
The latter case is by induction on the length of the reduction Q Bβ Q 0 .
Again it suffices to prove the property when P B1β P 0 and Q B1β Q 0 . It
is safe to assume that P contains no bounded variables in FV(x Q) (if
not, we can take P ∗ ≡α P).
Hence, P[Q/x] Bβ P[Q 0 /x] by reducing the substituted occurrences of
Q. But every redex which is present in P, will also be present in
P[Q 0 /x], so we can contract it.

35 of 472
Confluence III

Theorem (Church-Rosser)
If P Bβ M and P Bβ N, then there exists a term T such that M Bβ T
and N Bβ T .

The property stated in the Church-Rosser Theorem, is called


confluence. An immediate consequence is that if P has a β-nf Q, then
it is unique modulo α-conversion.

The meaning of confluence is that a program has a unique output, if


any, and every reduction strategy which terminates, yields it.

36 of 472
Confluence IV

To prove the Church-Rosser Theorem we need a number of auxiliary


definitions and lemmas.
Definition 3.2 (Marked λ-terms)
Call Λ the set of the λ-terms. The set of marked λ-terms, Λ∗ is the
minimal set such that:
1. If x ∈ V then x ∈ Λ∗ ;
2. If M , N ∈ Λ∗ then (M N) ∈ Λ∗ ;
3. If x ∈ V and M ∈ Λ∗ then (λx . M) ∈ Λ∗ ;
4. If x ∈ V and M , N ∈ Λ∗ then ((λ∗ x . M) N) ∈ Λ∗ .

Thus, the marked λ-terms have some of their redexes signed by a ∗.

37 of 472
Confluence V

Definition 3.3 (β∗ -reduction)


Any term in Λ∗ of the form (λx . M) N or (λ∗ x . M) N is called a
β∗ -redex and the term M[N/x] is called its contractum.
A term P β∗ -reduces to Q in one step, notation P B1β∗ Q, iff P
contains an occurrence of a β∗ -redex (λx . M) N or (λ∗ x . M) N, while
Q contains in the same place its contractum M[N/x].
We say that P β∗ -reduces to Q, notation P Bβ∗ Q, iff there exists a
finite sequence of terms R1 , . . . , Rn such that R1 ≡ P, Rn ≡ Q and, for
every i in 1 . . . (n − 1), Ri B1β∗ Ri +1 or Ri ≡α Ri +1 .

This definition is the same as Bβ extended to marked terms.

38 of 472
Confluence VI

Definition 3.4 (Substitution in Λ∗ )


Substitution in Λ∗ is defined as in Λ, plus the following rules:
■ (λ∗ x . P)[N/x] ≡ λ∗ x . P;
■ (λ∗ y . P)[N/x] ≡ λ∗ y . P if x 6≡ y and x 6∈ FV(P);
■ (λ∗ y . P)[N/x] ≡ λ∗ y . P[N/x] if x 6≡ y , x ∈ FV(P) and y 6∈ FV(N);
■ (λ∗ y . P)[N/x] ≡ λ∗ z . (P[z/y ])[N/x] if x 6≡ y , x ∈ FV(P), y ∈ FV(N)
and z 6∈ FV(N P).

Again, this definition is the same as the one for λ-terms extended to
marked terms.

39 of 472
Confluence VII
Definition 3.5 (Forgetful map)
The function | · | : Λ∗ → Λ maps every marked λ-term into the
| |
corresponding unmarked term. We write M / N if |M | = N.

Definition 3.6 (Contraction map)


The function φ : Λ∗ → Λ maps every marked λ-term into a term where
marked redexes are contracted:
■ φ(x) = x;
■ φ(M N) = φ(M) φ(N);
■ φ(λx . M) = λx . φ(M);
■ φ((λ∗ x . M) N) = φ(M)[φ(N)/x].
φ
We write M / N if φ(M) = N.
40 of 472
Confluence VIII
Lemma 3.7
For all M , N ∈ Λ and M 0 ∈ Λ∗ , there is N 0 ∈ Λ∗ and a reduction
M 0 Bβ∗ N 0 such that
M 0 β∗ / N 0
| | | |
 
M /N
β

commutes, i.e., if |M 0 | = M and M Bβ N, then M 0 Bβ∗ N 0 and |N 0 | = N.

Proof.
By induction on the length n of M Bβ N. If n = 0, M ≡ N so N 0 ≡ M 0 .
If n = 1, N is obtained by contracting a redex in M and N 0 can be
obtained by contracting the same redex in M 0 . Otherwise, M ≡α N
and N 0 ≡α M 0 . If n > 1, the conclusion follows by transitivity.
41 of 472
Confluence IX

Lemma 3.8
Let M , M 0 , N , L ∈ Λ∗ . Then
1. If x , y ∈ V with x 6≡ y and x 6∈ FV(L), then
(M[N/x])[L/y ] = (M[L/y ])[N[L/y ]/x].
2. φ(M[N/x]) = φ(M)[φ(N)/x];
3. If M Bβ∗ N then φ(M) Bβ φ(N).
Proof.
(1) By induction on the structure of M.
(2) By induction on the structure of M, using (1) when
M ≡ (λ∗ y . P) Q.
(3) By induction on the length of the reduction, using (2).
[Exercise] Fill the details.
42 of 472
Confluence X
Lemma 3.9
If M ∈ Λ∗ then there is a reduction |M | Bβ φ(M).
Proof.
By induction on the structure of M.
Lemma 3.10 (Strip lemma)
Let M , N1 , N2 ∈ Λ. If M B1β N1 and M Bβ N2 , then there is N3 ∈ Λ such
that N1 Bβ N3 and N2 Bβ N3 . That is, the following diagram commutes

β
M / N2

1β β
 
N1 / N3
β

43 of 472
Confluence XI
Proof.
Since M B1β N1 , the redex R ≡ (λx . P) Q occurs in M and gets
contracted in N1 . Let M 0 ∈ Λ∗ be obtained by replacing R in M with
R 0 ≡ (λ∗ x . P) Q. Then |M 0 | = M and φ(M 0 ) = N1 .
By Lemma 3.7 there is N20 ∈ Λ∗ such that |N20 | = N2 and M 0 Bβ∗ N20 .
By Lemma 3.8, there is N3 ∈ Λ such that N3 = φ(N20 ) and N1 Bβ N3 .
Finally, by Lemma 3.9, it holds that N2 Bβ N3 . In diagrams:
M aBQQQ
BB QQQ β
BB QQQ
β BB QQQ
 | | B QQQ
o 0
(
N1 M N2`A
φ AA
AA| |
AA
β
 A
' o (
N3 N20
φ
44 of 472
Confluence XII

Theorem 3.11 (Church-Rosser)


If P Bβ M and P Bβ N, then there exists a term T such that M Bβ T
and N Bβ T .

Proof.
By induction on the length of the reduction P Bβ M by means of the
Strip Lemma. In diagrams:
P / P1 +3 Pk /M

   
N / N1 +3 Nk /T

[Exercise] Fill the details.


45 of 472
β-equality I

The notion of β-reduction naturally generates an equality between


terms:
Definition 3.12 (β-equality)
Given M , N ∈ Λ, we say that M =β N, spelt as M is β-equal or
β-convertible to N, iff there is a finite sequence P1 , . . . , Pn of terms
such that P1 ≡ M, Pn ≡ N and, for every i < n, Pi B1β Pi +1 , or
Pi +1 B1β Pi or Pi ≡α Pi +1 .

Two terms are equal if they are mutually reducible, modulo a suitable
renaming. If we think to terms as stages of a computation, two terms
are equal if they are different stages of the same computation.

46 of 472
β-equality II

Lemma 3.13
β-equality is an equivalence relation.

Proof.
Evident.

Lemma 3.14
If M =β M 0 and N =β N 0 then M[N/x] =β M 0 [N 0 /x].

Proof.
By induction on the definition of M =β M 0 , via Lemma 3.1.

Thus, β-equality is a congruence, i.e., it is stable under substitution.

47 of 472
β-equality III

Theorem 3.15 (Church-Rosser)


If M =β N then there is a term L such that M Bβ L and N Bβ L.

Proof.
By induction on the generation of =β : Let P1 , . . . , Pn be a sequence
such that P1 ≡ M, Pn ≡ N and, for every i < n, P1 B1β Pi +1 or
Pi +1 B1β Pi or Pi ≡α Pi +1 .
The case n = 1 is trivial.
For 1 < i < n, the induction hypothesis says that M Bβ Ti and Pi Bβ Ti
since M =β Pi . If Pi B1β Pi +1 then Theorem 3.11 provides the required
Ti +1 such that M Bβ Ti +1 and Pi +1 Bβ Ti +1 . Otherwise, if Pi +1 B1β Pi
or Pi ≡α Pi +1 then Ti +1 ≡ Ti satisfies the statement.

48 of 472
β-equality IV

Corollary 3.16
■ If P =β Q and Q is a β-nf, then P Bβ Q;
■ If P =β Q and P and Q are β-nfs then P ≡α Q;
■ If P =β Q then either P and Q both have the same β-nf (modulo
α-conversion), or they both have no normal form;
■ A term P has at most one normal form, modulo α-conversion.
Proof.
Since β-nfs have no redexes, (1) and (2) are immediate. (3) is a direct
consequence of β-equality being an equivalence relation; (4) is evident
from (1) and (2).

49 of 472
References and Hints

This lesson is based on Chapter 1 of [HS].


The proof of the Church-Rosser Theorem can be found in Appendix 2
of the same book.

Exercises are required during examinations. Most of them are routine


proofs which you should perform with minimal efforts.

Finally, get familiar with the style of exposing statements and proofs:
although somewhat difficult in the beginning, this is the standard way
to present mathematical results, and it is adopted in any (decent)
book.

50 of 472
Fundamentals of Functional Programming
Lecture 4

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

We have proved that the λ-calculus is confluent, that is, if a term


reduces in two different ways, then it is possible to perform a
reduction on these terms that produces the same result.
As a consequence, we know that the output of the computation on a
λ-term is unique, if it exists, i.e., if the computation terminates.

The different possibilities in reducing a term are a consequence of the


non-deterministic nature of β-reduction. We can force a deterministic
behaviour by fixing a strategy, that is, an algorithm which, given a
λ-term, chooses a redex to contract.

52 of 472
Combinators I

Definition 4.1 (Combinator)


A combinator is a λ-term M such that FV(M) = ;.

Combinators are especially important in λ-calculus since they resemble


programs: they are applied to inputs in order to produce outputs. In
this view, λ-terms which are not combinators are thought to as
intermediate steps in a computation process.

53 of 472
Combinators II
Important combinators are given special names. The following ones
are mostly used:
■ I = λx . x — Identity I(x) = x;
■ K = λx , y . x — Constant functions Ka (x) = K(a)(x) = a;
■ S = λx , y , z . x z (y z) — Stronger composition
S(f , g )(x) = f (x , g (x));
■ B = λx , y , z . x (y z) — Function composition B(f , g )(x) = f (g (x));
■ B0 = λx , y , z . y (x z) — Reversed composition B0 (f , g )(x) = g (f (x));
■ C = λx , y , z . x z y — Commutative operator C(f )(x , y ) = f (y , x);
■ W = λx , y . x y y — Doubling operator W(f )(x) = f (x , x);
■ Ω = (λx . x x)(λx . x x) — Non-terminating operator.

54 of 472
Combinators III
It is possible to show that only S and K are needed to build all the
listed combinators. For example, I =β S K K.

As a matter of fact, it is possible to prove that the minimal set of


terms closed under composition and containing S and K suffices to
represent to whole λ-calculus, up to β-equality.
Thus, the simplest “programming language” is
■ S and K are “programs”;
■ If M and N are “programs”, then so is (M · N).
along with the reduction rules
■ K x y B x for every x, y “programs”;
■ S x y z B (x z)(y z) for every x, y and z “programs”.

55 of 472
Combinators IV

Since λ-calculus is Turing-complete, i.e., every computable function


can be represented in this system, it follows that the “simplest
programming language” is Turing-complete as well.

We will prove later that the λ-calculus is Turing-complete.

The “simplest programming language” is called Combinatory Logic


and, despite its apparent simplicity, it has a deep and involved theory.
In practise, Combinatory Logic is often used to compile functional
programs into highly efficient intermediate code, which is then
executed by a very fast virtual machine based on the fundamental
combinators.

56 of 472
Combinators V

For the sake of precision, it is possible to slightly simplify Combinatory


Logic by using just one combinator, U = λx . x K S K. The
corresponding reduction rule is U x B x ((U U) U)(U (U U))((U U) U).
It is easy to prove equivalence of Combinatory Logic with this reduced
system, noticing that (U U)U =β K and U(U U) =β S.
[Exercise] Prove the β-equivalences.

We will not develop any further Combinatory Logic in this course: we


will just develop some aspects of interest of the theory of combinators
inside λ-calculus.

57 of 472
Fixed Points I

In general, given a transformation f , a fixed point of f is any value x


such that x = f (x). For example, the function g (x) = x 2 from real
numbers to real numbers, has two fixed points, 0 and 1.

In the λ-calculus,
Theorem 4.2 (Fixed-point)
There is a combinator Y such that Y x Bβ x (Y x).

Proof (by Alan Turing).


Let U ≡ λu , x . x (u u x) and let Y ≡ U U.
So, Y x ≡ (λu , x . x (u u x)) U x, by definition of U, thus
Yx Bβ ((λx . x (uux))x)[U/u] ≡ (λx . x (UUx))x Bβ x (UUx) ≡ x (Yx).

58 of 472
Fixed Points II

Corollary 4.3
■ There is a combinator Y such that Y x =β x(Y x);
■ The equation M X =β X has X ≡ Y M as a solution;
■ The equation X y1 . . . yn =β Z with y1 , . . . , yn variables and Z a term,
can be solved for X , i.e., there is a term X such that
X y1 . . . yn =β Z [X /x].
Proof.
(1) and (2) are evident. For (3), choose X ≡ Y(λx , y1 , . . . , yn . Z ).

59 of 472
Reduction Strategies I
Definition 4.4 (Contraction)
Given a λ-term X , a contraction in X is a triple 〈X , R , C 〉 where R is a
redex occurrence in X and C is the result of contracting R in X . A
contraction is written as X BR C .

Definition 4.5 (Reduction)


A reduction ρ is a finite or infinite sequence of contractions separated
by α-conversions:

ρ ≡ (X1 BR1 Y1 ≡α X2 BR2 Y2 ≡α X3 BR3 Y3 ≡α . . . )

The start of ρ is X1 and the length of ρ is the number of its


contractions. If the length of ρ is finite, say n, then Xn+1 is the
terminus of ρ .
60 of 472
Reduction Strategies II

Contractions and reductions are nothing more than a formal way to


mark explicitly reductions and to study them.
Definition 4.6 (Maximal reduction)
A reduction ρ is maximal iff either it has infinite length, or its
terminus contains no redexes.

Thus, a maximal reduction represents a complete computation of its


start.
We want to classify reductions in order to study their properties and
to devise a sound and effective strategy which allows to always
produce an output if its exists.

61 of 472
Reduction Strategies III

Definition 4.7 (Leftmost reduction)


An occurrence of a redex R1 in a term X1 is called maximal iff it is not
contained in any other redex occurrence in X1 . It is called leftmost
maximal iff it is the leftmost of the maximal redex-occurrences in X1 .
A maximal reduction ρ such that, for each index i, the contracted
redex-occurrence Ri is leftmost maximal in Xi , is called the (unique)
leftmost reduction of X1 .

The leftmost reduction of X operates searching, from left to right, the


first maximal redex in X and contracting it, as far as there are no
more redexes in X .

62 of 472
Reduction Strategies IV

Definition 4.8 (Quasi-leftmost reduction)


A quasi-leftmost reduction ρ of a term X1 is a maximal reduction such
that, for each index i, if Xi is not the terminus, then there exists j ≥ i
such that Rj is leftmost maximal.

An infinite reduction is quasi-leftmost iff an infinity of its contractions


are leftmost maximal and they are uniformly spread across the whole
computation. An infinite reduction is leftmost if every redex in it is
leftmost maximal.
A finite reduction is leftmost if every redex in it is leftmost maximal,
and it is quasi-leftmost if any non-leftmost maximal contraction is
followed, sooner or later, by a leftmost maximal contraction.

63 of 472
Reduction Strategies V

Theorem 4.9 (Quasi-leftmost reduction)


If a λ-term X has a normal form X ∗ , then every quasi-leftmost
reduction of X is finite and its terminus is X ∗ .
[Proof not required]

From a computational point of view, the meaning of this result is that


any quasi-leftmost reduction on X strategy terminates whenever the
program X terminates.

64 of 472
Reduction Strategies VI

There are other reduction strategies, whose motivation is not


mathematical, but technical.

The call-by-value strategy operates as follows: a term X is represented


as a tree and the strategy evaluates it. Evaluation proceeds as follows:
■ if the root element is a variable or an abstraction, then the term
evaluates to itself;
■ if the root is an application, then the argument (right branch) is
evaluated first, then the function (left branch) is evaluated, and
finally, if the left-branch is a λ-abstraction, the tree is contracted.

65 of 472
Reduction Strategies VII

The call-by-value strategy is natural and it allows for a simple


implementation. But it may not terminate even if the term has a β-nf.
Most programming languages, e.g., ML, use call-by-value.
Example 4.10
Consider the term K(I w )Ω ≡ (λx , y . x)((λz . z) w )((λu . u u)(λu . u u)).
The leftmost strategy yields K(I w )Ω B (λy . I w )Ω B (λz . z)w B w ,
which is in β-nf.
The call-by-value strategy yields
K(I w )Ω B K w Ω B (λy . w )Ω B (λy . w )Ω B . . . forever.

66 of 472
Reduction Strategies VIII

The call-by-name strategy is similar to the call-by-value strategy: it


operates in the same way except for that arguments are not evaluated,
but rather directly substituted.

Call-by-name always terminate if the term being reduced has a β-nf.


But it may produce a result which is not in β-nf, since it does not
operate inside abstractions.

Some programming languages, like Haskell, are based on a variation of


call-by-name which optimises repeated reductions.

67 of 472
References and Hints

This lesson is based on Chapter 3 of [HS].

References to the call-by-value and call-by-name strategies can be


found in Chapter 5 of [Pierce]. For a general overview of reduction
strategies, consult
[Link]

The omitted proof of the Theorem on the quasi-leftmost reduction can


be found in the classical text [Bar].
This beautiful book is the standard reference for untyped λ-calculus
but it is very mathematical and somewhat expensive.

68 of 472
Fundamentals of Functional Programming
Lecture 5 — Intermezzo

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

Till now, we have studied λ-calculus from a mathematical perspective:


we have shown that it is confluent and we have devised strategies to
make evaluation of terms deterministic.

The aim of this lesson is to show how a programming language can be


developed inside λ-calculus. Specifically, we want to develop the basic
structure of a functional language, like ML, as an instance of
λ-calculus. Although this is not perfectly correct, as ML is based on a
typed calculus, it shows how one can effectively program using a
rather abstract system like the one we have studied so far.

70 of 472
Data Structures I

A data structure is a way to represent data in a program. Most data


structures can be coded as the set of logical terms generated by a
fixed multi-sorted signature. Each instance of the data structure is,
then, a logical term.
Definition 5.1 (Data structure)
A data structure D is a pair D = 〈S , F 〉 of finite sets of symbols, where
S is called the set of sorts and F the set of functions. Each element
f ∈ F is decorated with a type, f : s1 × · · · × sn → s, such that
s , s1 , . . . , sn ∈ S with n ≥ 0.
A sort t ∈ S is a parameter sort if, for every f :s1 ×· · ·× sn → s ∈ F , s 6≡ t.

71 of 472
Data Structures II
Every data structure D can be represented, that is, every term
constructed from the language D can be uniquely written as a λ-term
in a way that allows to manage the term and its information content.
The representation we use is a type-free version of a result due to
Corrado Böhm and Alessandro Berarducci.
Definition 5.2 (Representation of data structures)
© ª
Let D = 〈S , F 〉 a data structure, where F = f1 , . . . , fn . A term t on D
has the form f (t1 , . . . , tm ) with f : s1 × · · · × sm → s and t1 , . . . , tm terms
of type s1 , . . . , sm , respectively. This term is represented by a λ-term
t ≡ f t1 . . . tm where f ≡ fi for some index i and
f ≡ λx1 , . . . , xm , f1 , . . . , fn . fi s1 . . . sm with
½
xi if si is a parameter sort
si ≡
(xi f1 . . . fm ) otherwise
72 of 472
Booleans I
Booleans form the data structure

Bool = 〈{ bool }, { true : bool, false : bool }〉 .

Definition 5.3 (Booleans)


We encode booleans as λ-terms following the general rule. The result
is:
■ true ≡ λx , y . x;
■ false ≡ λx , y . y .
Here and in the following, we omit underlines when it is clear from the
context if we are dealing with a logical term or with a λ-term.
[Exercise] Check that the representation is correct.

73 of 472
Booleans II
Since, in our representation, a constant c : s ∈ F is represented as
c ≡ λf1 , . . . , fn . c, i.e., the i-th projector, it follows that

if ≡ λp , x , y . p x y

reduces to a selector when its first argument is a boolean.

In fact, (if true a b) =β a and (if false a b) =β b.

By a simple syntactical transformation we get the more usual

if c then x else y ≡ if c x y ,

which is present in most functional languages. Notice how the “else”


branch is mandatory.
74 of 472
Booleans III

With these tools, it is easy to define the usual operations on booleans:


■ and ≡ λx , y . if x y false;
■ or ≡ λx , y . if x true y ;
■ not ≡ λx . if x false true.

[Exercise] Check that the representation is correct by deriving the


truth tables of these operations via β-reduction.

75 of 472
Enumerations I

Enumerations are coded by the following data structure

Enum = 〈{ enum }, { e1 : enum, . . . , en : enum }〉 .

Definition 5.4 (Booleans)


We encode the elements of an enumeration as λ-terms following the
general rule. The result is ei ≡ λx1 , . . . , xn . xi .

Notice that booleans are a special case of enumerations.

76 of 472
Enumerations II
As we did for booleans, we can introduce a selector for enumerations:

case ≡ λp , x1 , . . . , xn . p x1 . . . xn ,

which has the property that case ei a1 . . . an =β ai .


So the syntax

case e
e1 : a1
..
.
en : an
end

is a human-readable presentation of the λ-term (case e a1 . . . an ).

77 of 472
Tuples
Tuples are instances of the family of data structures:

A1 × · · · × An = 〈{ A1 , . . . , An , T }, { tuple : A1 × · · · × An → T }〉 .

Definition 5.5 (Tuples)


We encode the tuple constructor as a λ-term following the general
rule: tuple ≡ λx1 , . . . , xn , y . y x1 . . . xn .
If we call i-th ≡ λy . y (λx1 , . . . , xn . xi ), then it is immediate to show that

i-th(tuple x1 . . . xn ) =β xi .

Since records are just tuples with named projectors, their


representation is immediate.

78 of 472
Natural numbers I

The natural numbers can be encoded in many ways. The most elegant
one is due to Alonzo Church. It naturally follows from the general
representation we adopted.
Natural numbers are the elements of the data structure

Nat ≡ 〈{ N }, { suc : N → N , 0 : N }〉 .

Definition 5.6 (Natural numbers)


Following the general rule, we get:
■ 0 ≡ λx , y . y ;
■ suc ≡ λx , y , z . y (x y z).

79 of 472
Natural numbers II

Let f n (x) be an abbreviation for f (. . . (f x) . . . ) where f occurs n times.


Then, the number k is coded as the λ-term
k ≡ suck (0) =β λx , y . x k (y ), as it is almost immediate to prove by
induction on k.

Moreover, testing for equality with 0 is easy:

iszero ≡ λn. n (λx . false) true ,

it has the property that iszero 0 =β true and iszero(suc n) =β false.

[Exercise] Verify the β-equivalences.

80 of 472
Natural numbers III

Basic arithmetical operations are easily defined


■ add ≡ λx , y , u , v . x u (y u v );
■ mult ≡ λx , y , z . x (y z);
■ expt ≡ λx , y , u , v . y x u v .
Obviously, there is a trivial translation from the usual syntax to
combinators:
■ x + y ≡ add x y ;
■ x · y ≡ mult x y ;
■ x y ≡ expt x y .
It is simple to verify that (add m n) =β h iff h = m + n is true in
arithmetic. Also the other obvious identities are proved similarly.

81 of 472
Lists I
The data structure of lists over a given type A is

List(A) ≡ 〈{ A, L }, { cons : A × L → L, nil : L }〉 .

Definition 5.7 (Lists)


Following the general rule, we get:
■ nil ≡ λx , y . y ;
■ cons ≡ λx , y , u , v . u x (y u v ).

The general shape of a list


[a1 , . . . , an ] ≡ cons a1 (cons a2 . . . (cons an nil) . . . ) ,
after a suitable β-reduction, is λx , y . x a1 (x a2 (. . . (x an y ) . . . )) as one
can verify by induction on n.
82 of 472
Lists II
It is easy to define the head operation, calculating the first element of
a given list: hd ≡ λx . x K nil and it holds that hd(cons a l ) =β a.
On the other hand, defining null and tl (for tail) such that
null nil =β true and null (cons a l ) =β false, and tl (cons a l ) =β l , is not
immediate and requires some additional knowledge. Specifically, one
should write a loop over the internal structure of the λ-term
representing the list.

It is evident that all the usual data structures can be encoded in Λ


following the rules we have shown. In fact, it is even possible to define
a generic datatype construction which automatically compiles into the
proper representation and which allows to inductively dismount terms
in a standard way. For example, ML has the datatype constructor for
such purposes.
83 of 472
Control Structures
In the usual programming paradigms, we have three main control
structures: sequence, selection and iteration.

In the λ-calculus, as well as in most functional languages, we have the


very same control structures, although they are “built-in” in the
properties of β-reduction.
For instance, “sequence” is encoded by the B combinator and
“selection” by the if combinator operating on booleans.

“Iteration”, in its purest form is controlled by fixed-points, and thus


implemented via the Y combinator. No real programming language
directly uses this approach: in functional programming, recursion is
used instead of explicit iteration and, behind the scenes, it reduces to
a clever application of the Y combinator.
84 of 472
References and Hints

The representation of data structures as λ-terms is derived from:


Alessandro Berarducci and Corrado Böhm, Automatic Synthesis of
Typed Lambda-Programs on Term Algebras, Theoretical Computer
Science 39 (1985) 135–154. It is also illustrated in [Bar2]

The examples of representation are, more or less, standard. Most of


them can be found in [Paulson] Chapter 9, and in [Pierce] Chapter 5.2.

Church numerals are defined in Chapter 4 of [HS].

[Paulson] is also a reference for the datatypes in ML, in particular


Chapters 2, 3 and 4.

85 of 472
Fundamentals of Functional Programming
Lecture 6

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

Apart iteration, we have seen how the λ-calculus can be used to


develop a real functional programming language, with the usual data
types and control structures.

But what can we calculate by means of λ-terms? This fundamental


question will have an answer in this lesson, where we will show that
the λ-calculus is Turing-complete.

As a side effect, we will get a uniform way to represent recursive


definitions as λ-terms, which solves the open point about iteration.

87 of 472
Representability

Definition 6.1 (Representability)


Let φ be a partial function from Nn into N. A λ-term X is said to
represent φ iff, for all m1 , . . . , mn ∈ N,
■ if φ(m1 , . . . , mn ) = p, then X m1 . . . mn =β p;
■ if φ(m1 , . . . , mn ) is undefined, then X m1 . . . mn has no β-nf.
In the previous statements, m is the Church numeral representing the
natural number m.

It is easy to check that the combinators suc, add, mult, and expt, as
previously defined, represent the corresponding operations.

88 of 472
Primitive recursive functions I
Definition 6.2 (Primitive recursive functions)
The set of primitive recursive functions on natural numbers is defined
by induction:
■ The successor function suc is primitive recursive;
■ The number 0 is primitive recursive;
■ For each n ≥ 1 and k ≤ n, the projection function Πnk is primitive
recursive, where
Πnk (m1 , . . . , mn ) = mk ;
■ If n, p ≥ 1 and ψ, ξ1 , . . . , ξp are primitive recursive, then so is φ
defined by composition

φ(m1 , . . . , mn ) = ψ(ξ1 (m1 , . . . , mn ), . . . , ξp (m1 , . . . , mn )) ;

,→
89 of 472
Primitive recursive functions II
,→ (Primitive recursive functions)
■ If ψ and ξ are primitive recursive, then so is φ defined by recursion
as follows
φ(0, m1 , . . . , mn ) = ψ(m1 , . . . , mn ) ,
φ(k + 1, m1 , . . . , mn ) = ξ(k , φ(m1 , . . . , mn ), m1 , . . . , mn ) .

By checking the definition, every primitive recursive function is total.

Example 6.3
The predecessor function pre defined by pre(0) = 0 and pre(k + 1) = k
is primitive recursive.
In fact, by recursion, pre(0) = 0 and pre(k + 1) = Π21 (k , pre(k)).

90 of 472
Primitive recursive functions III
Theorem 6.4
Every primitive recursive function φ can be represented by a
combinator φ.
Proof.
The combinator φ is defined by induction.
■ suc ≡ λu , x , y . x (u x y ), as before;
■ 0 ≡ λx , y . y , as before;
■ Πnk ≡ λx1 , . . . , xn . xk ;
■ (Composition) φ ≡ λx1 , . . . , xn . ψ (ξ1 x1 . . . xn ) . . . (ξp x1 . . . xn );
■ (Recursion) φ ≡ λu , x1 , . . . , xn . R (ψ x1 . . . xn )(λu , v . ξ u v x1 . . . xn ) u,
where R is a recursion combinator satisfying R X Y 0 =β X and
R X Y suck +1 (0) =β Y k(R X Y k).
91 of 472
Primitive recursive functions IV

The recursive combinator R is constructed in steps:


■ Define D ≡ λx , y , z . z (K y ) x. It holds that

DXY 0 Bβ X
D X Y k + 1 Bβ Y .

■ Define Q ≡ λy , v . D (suc (v 0))(y (v 0) 1). It holds that

Q Y (D n X ) Bβ D n + 1 (Y n X )
(Q Y )k (D 0 X ) Bβ D k W

for some term W whose details are irrelevant.


■ Define R ≡ λx , y , u . u (Q y ) (D 0 x) 1. It holds that R X Y k Bβ W .

92 of 472
Primitive recursive functions V

Calculating, we get:
■ R X Y 0 Bβ 0 (Q Y ) (D 0 X ) 1
Bβ D 0 X 1
Bβ X
■ R X Y k + 1 Bβ k + 1 (Q Y ) (D 0 X ) 1
Bβ (Q Y )k +1 (D 0 X ) 1
Bβ (Q Y ) ((Q Y )k (D 0 X )) 1
Bβ (Q Y ) (D k W )) 1
Bβ D k + 1 (Y k W ) 1
Bβ Y k W
Thus, R X Y k + 1 =β Y k (R X Y k), as required .

93 of 472
Primitive recursive functions VI

An alternative definition for R uses the fixed-point combinator Y.

First, the predecessor function, being primitive recursive, can be


represented in λ-calculus, e.g., by the term R 0 K.

Let pre be any λ-term representing the predecessor function and


consider the equation in R

R x y z = if iszero z then x else y (pre z) (R x y (pre z)) .

By the Fixed-Point Theorem, it has a solution R which is

R ≡ Y (λu , x , y , z . D x (y (pre z) (u x y (pre z))) z) .

94 of 472
Primitive recursive functions VII

One can define the cut-off subtraction between natural numbers as


½
p if m ≥ n, where m = n + p,
m−n =
0 otherwise .

It is an easy exercise (Do it!) to show by means of the predecessor


function that this subtraction is primitive recursive. So, we can really
build a λ-term which computes it.

Having subtraction, we can define the order relations on natural


numbers as x < y ≡ iszero(x − y ) and x ≥ y ≡ not(x < y ). Also,
x > y ≡ y < x and x ≤ y ≡ y ≥ x. Finally, equality on naturals is
x = y ≡ and(x ≤ y )(y ≤ x).

95 of 472
Iteration I

A non-immediate consequence of the theorems we have proved so far


is that we have a construction that allows to model iteration.

In fact, a recursive definition of a function, as far as it is primitive


recursive (and this is a “natural” condition), can be represented by a
λ-term which uses a recursive combinator.

This fact allows to model the worst case of procedural programming:


the GOTO statement.

In the following example there is the technique, which is easy to


generalise and to formalise. Nevertheless, this way of programming
leads to poor style and obscure programs, so it is highly discouraged!

96 of 472
Iteration II
Example 6.5
Consider the following code:
var x := 0; y := 0; z := 0;
α: x := x + 1;
β: if y < z then goto α else y := x + y ;
γ: if z > 0 then begin z := z − x; goto α; end else stop;

In a functional representation:
α(x , y , z) = β(x + 1, y , z);
β(x , y , z) = if y < z then α(x , y , z) else γ(x , x + y , z);
γ(x , y , z) = if z > 0 then α(x , y , z − x) else (x , y , z);

Executing α(0, 0, 0), we get exactly the same computation as for the
procedural code.
97 of 472
Iteration III

Notice that the presented technique plus the results on arithmetic we


have developed so far allow to formalise the assembly language inside
λ-calculus.

Since assembly, when full arithmetic is allowed, is Turing-complete, it


follows that λ-calculus is at least as powerful as the class of Turing
machines. Or, in other words, everything which is computable, is
representable in the λ-calculus.

This result can be formalised in a precise way, with the appropriate


theorems, but it is long and tedious. Thus, since its development
would not lead to significant advances in our knowledge, it is left out
from our treatment.

98 of 472
Partial recursive functions I

Definition 6.6 (Partial recursive functions)


A function φ from a subset Nn into N is called partial recursive iff
there exist primitive recursive ψ and ξ such that, for all m1 , . . . , mn ∈ N,

φ(m1 , . . . , mn ) = ψ(µk[ξ(m1 , . . . , mn , k) = 0]) ,

where µk[ξ(m1 , . . . , mn , k) = 0 is the least k ∈ N such that


ξ(m1 , . . . , mn , k) = 0, if such a k exists, and is undefined otherwise.

Partial recursive functions, also called Kleene’s functions, after the


name of their discoverer, are equivalent to Turing machines’ computed
functions. Thus, they are Turing-complete.

99 of 472
Partial recursive functions II

Theorem 6.7
Every partial recursive function can be represented by a combinator.

Proof. (i)
Let ψ and ξ be primitive recursive and, for all m1 , . . . , mn ∈ N, let
φ(m1 , . . . , mn ) = ψ(µk[ξ(m1 , . . . , mn , k) = 0]) .

By Theorem 6.4, ψ and ξ are representable. Consider the equation


H x1 . . . xn y = (if ξ x1 . . . xn y = 0 then y else H x1 . . . xn (suc y )) .

By the Fixed-Point Theorem, it has a solution


H ≡ Y (λu , x1 , . . . , xn , y . D y (u x1 . . . xn (suc y ))(ξ x1 . . . xn y )) .

,→
100 of 472
Partial recursive functions III
,→ Proof. (ii)
Thus, if φ(m1 , . . . , mn ) is defined, it is represented by
F ≡ λx1 , . . . , xn . ψ(H x1 . . . xn 0) .
For all m1 , . . . , mn ∈ N, we have F m1 . . . mn =β φ(m1 , . . . , mn ).
Let T ≡ λx . D 0 (λu , v . u (x (suc v )) u (suc v )) and
P ≡ λx , y . T x (x y )(T x) y . Then, if X Y =β 0, P X Y =β Y , and if
X Y =β m + 1, P X Y =β P X (suc Y ).
Define
φ ≡ λx1 , . . . , xn . P (ξ x1 . . . xn ) 0 I (F x1 . . . xn ) .

First, suppose there is a minimal k such that ξ(m1 , . . . , mn , k) = 0.


Then, φ m1 . . . mn =β k I (F x1 . . . xn ) =β Ij (F x1 . . . xn ) =β F x1 . . . xn =β
φ(m1 , . . . , mn ). So, φ represents φ, when φ is defined. ,→
101 of 472
Partial recursive functions IV
,→ Proof. (iii)
Suppose that m1 , . . . , mn are such that there is no k such that
ξ(m1 , . . . , mn , k) = 0. So, being ξ total, for every k there is a pk such
that ξ(m1 , . . . , mn , k) = pk + 1. Calling X ≡ ξ m1 . . . mn , X k Bβ pk + 1.
To show that X has no β-nf it suffices to find an infinite quasi-leftmost
reduction. Consider the following reduction, where G ≡ F m1 . . . mn :
X Bβ P X 0 I G
Bβ T X (X 0)(T X ) 0 I G ∗
Bβ T X (X 1)(T X ) 1 I G ∗
Bβ . . .
Bβ T X (X j)(T X ) j I G ∗
Bβ . . .
The lines marked with ∗ clearly show that this reduction cannot
terminate and they contains one leftmost maximal contraction.
102 of 472
Partial recursive functions V

In the first place, the previous result shows that the λ-calculus is
Turing-complete, since it can compute all the computable functions,
and nothing more.

In the second place, the result allows to define recursive functions


which are not terminating, thus extending our interpretation of
iteration as a restricted form of recursion.

103 of 472
References and Hints

This lesson covers the material of Chapter 4 of [HS].

The model of the GOTO statement is taken from [Paulson] Chapter 2.

More information on Kleene’s functions can be found in any standard


textbook on Computability Theory or on Recursion Theory.
The most authoritative reference is: Piergiorgio Odifreddi, Classical
Recursion Theory, North Holland (1992). ISBN: 0444894837

104 of 472
Fundamentals of Functional Programming
Lecture 7 — Intermezzo

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline
We have seen that λ-calculus is Turing-complete. As a side effect, we
have seen that a very general form of recursion can be used. In fact,
as far as we are able to state a problem as a set of primitive recursive
equations, and we ask for a minimal solution, we know that there is a
λ-term which satisfies the equations.

In practise, we never solve those equations, but, since we know that


they can be solved, we can see the equations as a definition for a
function.

In the practise of functional programming, this approach is the


common rule and, in this lesson, we want to show how far it can be
pushed. We will see that, using these ideas in a rather elementary way,
we can represent infinitary data structures and we can easily develop
algorithms to treat them.
106 of 472
Currying I

In functional programming, functions like add(x , y ) = x + y are usually


written as add x y = x + y .

As λ-terms they are different:


■ in the first case, add: N2 → N, and add ≡ λw : N2 . (fst w ) + (snd w );
■ in the second case, add: N → (N → N) and add ≡ λx : N, y : N. x + y .

A multiple-argument function f : A1 × · · · × An → B can always be


turned into a function g : A1 → (. . . (An → B) . . . ) computing the same
value. This operation is called currying after Haskell B. Curry, one of
the fathers of combinatory logic.

107 of 472
Currying II
Currying seems easy, but it is powerful too. In fact, if add x y = x + y ,
we can define suc ≡ add 1. In general, we can partially apply a curried
function, leaving its last arguments free, to obtain new functions.

In this way, currying is an essential ingredient when thinking “by


combinators”.

Treating functions as combinators allows to store them in data


structures, to pass them as arguments to other functions, and to
return them as results of a computation. Moreover, since functions are
pieces of data, we can construct new functions at runtime, when
needed.

So, in a functional language, functions, treated as combinators, are


first-class data objects.
108 of 472
List manipulation I

As arrays are dominant in imperative programming, lists are the main


data structure in functional programming. So, to show how thinking
to functions as combinators, lists are a natural candidate.

Lists are manipulated by a few basic functions:


■ null ≡ if (L = []) then true else false;
■ hd (cons X L) = X ;
■ tl (cons X L) = L.

We have already seen how null and hd can be explicited as λ-terms. In


fact, also tl can be written in an explicit form: it is simply the solution
of the above equation.

109 of 472
List manipulation II

More complex functions are written, starting from these basic bricks.

For example, take L n returns the first n elements in the L list. It is


defined as the solution to the pair of equations:

take [] i = []
take (x :: xs) i = if (i > 0) then x :: (take xs (i − 1)) else [] .

Here, and in the following, we write [] for nil, the empty list, and x :: xs
for cons x xs, following the ML syntax.

Currying the take function gives LL ≡ take L, which is a function that


the return the first n elements of a fixed list.

110 of 472
List manipulation III

The opposite of take is drop, which leaves out the first n elements
from a given list. It is defined as:

drop [] i = []
drop (x :: xs) i = if (i > 0) then (drop xs (i − 1)) else (x :: xs) .

Concatenating together two list is performed by append, usually


written in the infix syntax @:

[] @ L =L
(x :: xs) @ L = x :: (xs @ L) .

Notice how currying the append function gives “prepend”.

111 of 472
List manipulation IV
Reversing a list is easy. We show an efficient algorithm:

rev = revAux []
revAux L [] =L
revAux L (x :: xs) = revAux (x :: L) xs .

This algorithm is efficient since it is tail recursive: in a call-by-value


reduction strategy, each recursive call is performed as the last step in
the reduction.

Notice how currying has been used to define rev from revAux. It is
often convenient to use additional arguments in a function, to
accumulate intermediate results. Then, we can get rid of them via
currying.

[Exercise] Write a function which computes the length of a given list.


112 of 472
List functionals I

Functionals are functions taking functions as arguments

The easiest example is map. Its action is

map f [x1 , . . . , xn ] = [f (x1 ), . . . , f (xn )] .

The functional map is defined as

map f [] = []
map f (x :: xs) = (f x) :: (map f xs) .

Notice how currying map lifts a function f from elements to lists. So,
for example, double ≡ map (λx . x ∗ 2).

113 of 472
List functionals II
Another important example of functional is filter. Given a predicate p,
i.e., a function returning a boolean value, and a list L, it returns the
sublist of L whose elements make p true.

The functional filter is defined as:


filter p [] = []
filter p (x :: xs) = if (p x) then (x :: (filter p xs)) else (filter p xs) .

A small improvement in the efficiency of filter can be achieved by


using the let construction, which is an abbreviation allowing to
pre-compute a subterm occurring more than once:
filter p [] = []
filter p (x :: xs) = let R ≡ (filter p xs) in if (p x) then (x :: R) else R .

114 of 472
List functionals III

Using map and filter, some complex operations can be coded in a


simple and elegant way

For example, matrix transpose, which exchanges rows with columns in


a matrix, when the matrix is represented as a list of lists, becomes:

transpose ([] :: L) = []
transpose r = (map hd r ) :: (transpose (map tl r )) .

115 of 472
List functionals IV

Other interesting functionals on lists are exists and forall. They both
take as arguments a predicate p and a list L. The former functional,
exists, is true when there is an element in L satisfying p, while the
latter, forall, is true when every element in L satisfies p.

They are defined as follows:

exists p [] = false
exists p (x :: xs) = (p x) or (exists p xs)
forall p [] = true
forall p (x :: xs) = (p x) and (forall p xs) .

116 of 472
List functionals V
As an application, list membership is readily expressed as:

mem x ≡ exists (λy . x = y ) .

We’ve assumed that = is some function testing equality. This


assumption is correct for every datatype we have seen till now: we can
write functions testing if, e.g., two numbers or two lists are equal.

In general, this cannot be done. For example, functions cannot be


tested for equality, because this problem, as β-equality, is undecidable.

But, since functional languages are typed, there is a class of types that
takes care of distinguishing types with an equality function from those
without. Functions like mem are, implicitly, defined only on “equality
types”.
117 of 472
List functionals VI
More complex functionals are possible. Two of them are fundamental:
foldl and foldr, “fold left” and “fold right”.

They act as follows:

foldl f e [x1 , . . . , xn ] = f xn (f xn−1 (. . . (f x1 e) . . . ))


foldr f e [x1 , . . . , xn ] = f x1 (f x2 (. . . (f xn e) . . . )) .

Their definitions are easy;

foldl f e [] =e
foldl f e (x :: xs) = foldl f (f x e) xs
foldr f e [] =e
foldr f e (x :: xs) = f x (foldr f e xs) .

118 of 472
List functionals VII

A simple application of these functionals is:

map f ≡ foldr (λy , l . (f x) :: l )[] .

Another one, calculating the length of a list, is:

length ≡ foldl (K suc) 0 .

A more complex application is to calculate the Cartesian product of


two lists, interpreted as sets:

cartesianproduct xs ys ≡ foldr (λx , p . foldr (λy , l . (x , y ) :: l ) p ys) [] xs .

119 of 472
Sequences I
The idea behind infinite lists, or sequences as they are usually called, is
to store their values as a function. More technically, a sequence is a
data structure defined as:

〈{ A, S }, { Cons : A × (1 → S) → S , Nil : S }〉 .

where 1 is the unit type, containing a single element, denoted by ().


This datatype does not follow the previously defined format, in fact, it
does not expand into a free algebra of terms. Anyway, it can be easily
coded adjusting the general representation of datatypes:

Cons ≡ λx , y , u , v . u x (K (y u v ))
Nil ≡ λu , v . v .

We use the same constructors as for list, to emphasise the analogy.


120 of 472
Sequences II
To understand how the representation of sequences works, consider
[a, b] represented as a sequence:

[a, b] ≡ Cons a (Cons b Nil) B λu , v . u a (K (u b (K v )) .

Compare this representation with the representation of lists.

It is immediate to define
hd (Cons x xs) = x
tl (Cons x xs) = xs
null Nil = true
null (Cons x xs) = false

where the solution for hd is λx . x K Nil, while, as before, tl and null are
complex λ-terms.
121 of 472
Sequences III
Consider the function

from k = Cons k (K (from (k + 1))) ,

it evaluates as

from 1 B Cons 1, (K (from 2)) B · · · B [1, 2, 3, . . . , K (from n)] B . . .

If we assume that the reduction strategy does not expand K (from 2),
the reduction stops after one step.

This assumption, in a call-by-value environment, can be enforced since


evaluation never applies β-reduction inside the scope of a λ.
As a result, we can define a term which encodes the sequence of all
natural numbers: Nat = from 0.
122 of 472
Sequences IV
With the ability to define infinite data structures, as the sequence of
natural numbers, it is natural to define functions to inspect and to
manipulate them. We mimic lists, defining (almost) the same
functions, but operating on sequences.
For example,

take 0 s = []
take n (Cons x s) = x :: (take (n − 1) (s()))

which returns the list of the first n elements in the s sequence.


Or also,
Nil @ y =y
(Cons x s) @ y = Cons x (K ((s()) @ y ))
which appends the y sequence to the first argument. Notice how
x @ y = x if x is infinite.
123 of 472
Sequences V

A useful variant on append is interleave

interleave Nil y =y
interleave (Cons x s) y = Cons x (K (interleave y (s())))

Its action is best described by an example:

take (interleave (from 0) (from 10))6 = [0, 10, 1, 11, 2, 12] .

As these examples have shown, it is simple to define functions


operating on sequences, since they are very similar to the
corresponding functions on lists.

124 of 472
Sequences VI
Most of the functionals on lists can be immediately redefined to
operate on sequences:
map f Nil = Nil
map f (Cons x s) = Cons (fx) (K (map f (s())))
filter p Nil = Nil
filter p (Cons x s) = if (p x) then (Cons x (K (filter p (x()))))
else (filter p (s()))
Clearly, exists and forall are not useful, since their full evaluation
requires to inspect to whole sequence.

Also, foldl is not definable, while foldr has an equivalent:


iterates f x = Cons x (K (iterates f (f x))) .
For example,
take (iterates (λx . x ∗ x) 2) 5 = [2, 4, 8, 16, 32] .
125 of 472
Prime numbers I

A natural number greater than 2 is said to be prime if it can be


divided only by 1 and itself.

An algorithm which allows to derive the sequence of all prime numbers


is due to Eratosthenes. It starts with the sequence on natural numbers
from 2.
At every step, the head of s, the current sequence, is a prime number.
Let’s call it p. The new sequence to consider is, then, the tail of s
where all multiples of p are cancelled.

This algorithm never terminates, because the sequence of prime


numbers is infinite, as proved by Euclid. By the way, Euclid’s proof
(300 B.C.) was the first proof of non-termination in the history.

126 of 472
Prime numbers II
The algorithm to generate the prime numbers is simple to encode
using sequences.

First, we write a function eliminating multiples of a given number from


a sequence:
sift p = filter (λn. n mod p 6= 0) .
Then, we iterate the procedure of erasing the multiples of the head
from the tail of the sequence:

sieve (Cons p s) = Cons p (K (sieve (sift p (s())))) .

So, the sequence of prime numbers is defined and calculated by

Primes = sieve (from 2) .

127 of 472
References and Hints

The material in this lecture comes from [Paulson] Chapter 3 and 5.

The interested reader can find hints on the implementation of infinite


data structures and lazy evaluation in non-functional languages in
[Link]
cps343/lecture_notes/[Link].

128 of 472
Fundamentals of Functional Programming
Lecture 8

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

Till now, we have worked within the pure λ-calculus, showing that it
can be seen as a Turing-complete programming language.

In the mathematical practise, as well as in the programming practise,


the definition of a particular function usually includes a statement of
the kinds of inputs it will accept, and the kind of outputs it will
produce.

By labelling terms with types, i.e., names for distinguished families of


values, we can control the inputs and the outputs of functions.

130 of 472
Church vs Curry Typing

There are two very different ways to attach types to terms:


■ The Church-style (called also explicit or rigid typing) where a type
is a built-in part of a term;
■ The Curry-style (called also implicit typing) where a type is
assigned to a term after the term has been built.
Both typing styles are used in the implementation of functional
languages. In a way, the Church-style is closer to what is common in
imperative languages, like Java or C, while the Curry-style is usually
preferred in functional languages because of its conciseness. For
example, ML adopts the Curry-style.

131 of 472
Type Algebras

On a different line, the set of allowed types form an algebraic


structure, which has to mimic the application and abstraction of terms.

There are many significant typing algebras. The simplest one is due to
Church and its called simple (theory of) types. This algebra can be
used in both a Church-style and a Curry-style.

By adding more structure to the simple types, one can develop more
sophisticated type systems. If the additional structure is developed
with some care, these systems allow to model a computational
meaning for many logical system, thus adding depth to the functional
paradigm which becomes a special way to perform logic programming.

132 of 472
Polymorphism

In a simpler way, one can fix an algebra and add variables for types,
along with a notion of substitution. This idea leads to polymorphic
types.

Not all the typing system are “good”: in fact, even for very poor
algebras, deciding whether a term is correctly typed can be an
undecidable statement. Thus, the interesting algebras are limited by
the computational power needed to deal with them.

Most functional programming languages, like ML, are based on the


simple theory of types, in a polymorphic version.

133 of 472
Types

Definition 8.1 (Simple types)


Let T be a set of symbols, called the atomic types, then τ is a type
either if it is atomic, i.e., τ ∈ T , or if it is a function type. i.e.,
τ ≡ (σ → ρ ) where σ and ρ are types.
It is customary to think to function types σ → ρ as denoting the
functions from the domain set σ to the co-domain (or range) set ρ .
As usual, we omit the outer parentheses, and we use the abbreviation
σ1 → σ2 → · · · → σn → τ for (σ1 → (σ2 → (. . . (σn → τ) . . . ))).

The simple theory of types limits type construction to just one binary
operation, →, and assumes to have a predefined set T of constant
types. Polymorphic systems will have also type variables, and more
sophisticated systems will have more operations and axioms on them.
134 of 472
Terms I

Definition 8.2 (Typed variables)


The set V of typed variables is a denumerable set of pairs x : τ such
that
■ x is a symbol and τ is a type;
■ for every x : τ, y : σ ∈ V , if x ≡ y then τ ≡ σ;
for each type τ, the set (x : τ) : (x : τ) ∈ V
© ª
■ is denumerable.

The conditions on typed variables mean that we have always “enough


variables” of any given type.

135 of 472
Terms II

Definition 8.3 (Simply typed λ-terms)


Assume to have an at most denumerable set C of constants,
composed of pairs c : τ where c is a symbol and τ a type, such that
C ∩ V = ;. A typed λ-term t of type τ, notation t : τ, is defined as
■ t : τ ∈ V or t : τ ∈ C ;
■ t ≡ (M N) where M : σ → τ and N : σ are typed λ-terms;
■ t ≡ (λx : σ. M) and τ ≡ σ → ρ , with M : ρ a typed λ-term and x : σ a
typed variable.
Notation conventions on typed λ-terms are the usual ones.

Notice how typed λ-terms have explicit types in the syntax of


λ-abstraction.

136 of 472
Terms III

Example 8.4
For every type σ, the following is a typed term:

Iσ ≡ (λx : σ. x : σ) : σ → σ .

Usually, it is abbreviated into the simpler, but equivalent, λx : σ. x.

Omitting types is “safe”: one can prove by induction on the


construction of typed terms that the type of a term is uniquely
determined by its structure and the types of variables and constants.

137 of 472
Substitution I

Definition 8.5 (Substitution)


The term (M : τ)[(N : σ)/(x : σ)] is defined exactly as for pure
λ-calculus.

It is routine to check that (M : τ)[(N : σ)/(x : σ)] is a typed term of


type τ.

Note that the definition of (M : τ)[(N : ρ )/(x : σ)] requires σ ≡ ρ .

Intuitively, typed substitution is just pure substitution restricted to a


given type σ.

138 of 472
Substitution II
Lemma 8.6
1. In M : τ, replacing a term P : σ by Q : σ, leads to a term of type τ;
2. (α-conversion) If (λx : σ. M : τ) : σ → τ is a typed term, then so is
(λy : σ. M[(y : σ)/(x : σ)]) : σ → τ;
3. (β-reduction) If ((λx : σ. M : τ) : σ → τ)(N : σ) is a typed term, then
so is (M : τ)[(N : σ)/(x : σ)].

Proof.
Straightforward inductions on the structure of typed terms.
[Exercise] Develop the details of the proof.

All the lemmas on substitution and α-conversion in the pure


λ-calculus hold for typed terms, with proofs unchanged.
139 of 472
The formal system I

Definition 8.7 (Equality in λβ →)


The formal system λβ → has formulae of the form M : τ = N : τ with
M : τ, N : τ typed terms, and the following axiom schemes and rules:
(α) If y : σ 6∈ FV(M : τ), then λx : σ. M : τ = λy : σ. M : τ[y : σ/x : σ];
(β) ((λx : σ. M : τ)N : σ) : τ = M : τ[(N : σ)/(x : σ)];
(ρ ) M : τ = M : τ;
(µ) If M : σ = N : σ, then (P : σ → τ)(M : σ) = (P : σ → τ)(N : σ);
(ν) If M : σ → τ = N : σ → τ, then (M : σ → τ)(P : σ) = (N : σ → τ)(P : σ);
(ξ) If M : τ = N : τ, then λx : σ. M : τ = λx : σ. N : τ;
(τ) If M : σ = N : σ and N : σ = P : σ, then M : σ = P : σ;
(σ) If M : σ = N : σ, then N : σ = M : σ.

140 of 472
The formal system II

Definition 8.8 (Reduction in λβ →)


The formal system λβ → has formula of the form M : τ B N : τ with
M : τ, N : τ typed terms, and the following axiom schemes and rules:
(α) If y : σ 6∈ FV(M : τ), then λx : σ. M : τ B λy : σ. M : τ[(y : σ)/(x : σ)];
(β) ((λx : σ. M : τ)N : σ) : τ B M : τ[(N : σ)/(x : σ)];
(ρ ) M : τ B M : τ;
(µ) If M : σ B N : σ, then (P : σ → τ)(M : σ) B (P : σ → τ)(N : σ);
(ν) If M : σ → τ B N : σ → τ, then (M : σ → τ)(P : σ) B (N : σ → τ)(P : σ);
(ξ) If M : τ B N : τ, then λx : σ. M : τ B λx : σ. N : τ;
(τ) If M : σ B N : σ and N : σ B P : σ, then M : σ B P : σ.

141 of 472
The formal system III
The notions of redex, contraction, β-reduction, β-conversion and β-nf
are defined on typed terms exactly as in pure λ-calculus. It is routine
to prove that
■ If M : σ B N : τ, then σ ≡ τ;
β
■ M : σ B N : σ iff M : σ B N : σ;
β
■ M : σ = N : σ iff M : σ = N : σ.
β
[Exercise] Fill the proofs.

Note that all other properties of reduction and equality hold, with the
same proof as before. In particular, the Church-Rosser Theorem and
the uniqueness of normal forms are true.

Notice also that

(λx : σ. x : σ) : (σ → σ) 6= (λx : τ. x : τ) : (τ → τ) .
142 of 472
Normalizability I

Definition 8.9 (Weak normalizability, Strong normalizability)


A term M is said to be weak normalisable (WN) iff it has a β-nf; it is
called strongly normalisable (SN) iff all reductions starting at M have
finite length.
Evidently, SN implies WN. Consider the following terms:
■ Ω ≡ (λx . x x)(λx . x x);
■ T2 ≡ (λx . y )Ω;
■ T3 ≡ (λx . y )(λx . x).
The first term, Ω, has no normal form and it has an infinite reduction,
so it is neither WN nor SN. The second term, T2 , has an infinite
reduction, so it is not SN, but it has a normal form, so it is WN. The
third term, T3 has only finite reductions, thus it is SN and hence WN.
143 of 472
Normalizability II
Theorem 8.10
All terms in λβ → are SN.
Proof.
A term M : τ is said to be strongly computable (SC) iff
■ either τ is atomic and M is SN;
■ or τ ≡ σ → ρ and, for every term X : σ, the term (M X ) : ρ is SC.
The proof of the theorem reduces to four lemmas, whose goal is to
show that (1) every SC term is SN, and (2) every term is SC.
Corollary 8.11
Every typed term in λβ → has a β-nf.
Corollary 8.12
In the simply-typed λ-calculus, the relation =β is decidable.
144 of 472
Normalizability III

Lemma 8.13
1. Each type τ can be written in a unique way in the form
τ1 → τ2 → · · · → τn → θ where n ≥ 0 and θ is atomic;
2. If M : τ with τ ≡ τ1 → . . . τn → θ , then M is SC iff, for all SC terms
Y1 : τ1 , . . . , Yn : τn , (M Y1 . . . Yn ) : θ is SN.
3. If X : τ is SC (or SN) and M ≡α N, then N is SC (or SN);
4. If X : σ → ρ is SC and Y : σ is SC, then so is (XY ) : ρ ;
5. If X : τ is SN, then so is every subterm of X ;
6. If M[N : ρ /x : ρ ] : τ is SC then so is M : τ.
Proof.
Evident from the definition of SC and SN.

145 of 472
Normalizability IV
Lemma 8.14
1. every (a X1 . . . Xn ) : τ, with a an atom and X1 , . . . , Xn all SN, is SC;
2. every atomic term a : τ is SC;
3. every SC term of type τ is SN.
Proof.
(2) is an instance of (1). We prove (1) and (3) by induction on τ:
■ τ is atomic: since X1 , . . . , Xn are SN, so is aX1 . . . Xn , thus it is SC;
■ τ ≡ ρ → σ: let Y : ρ be SC, so by induction hypothesis (IH), it is
SN. Moreover, for the same reason, (a X1 . . . Xn Y ) : σ is SC. Thus,
so is a X1 . . . Xn by definition of SC.
Let X : τ and let x : ρ 6∈ FV(X : τ). By IH, x : ρ is SC, so by the
previous lemma (X x) : σ is SC. Thus, by IH, (X x) : σ is also SN.
But, by the previous lemma, X is SN, as well.
146 of 472
Normalizability V

Lemma 8.15
If (M : σ)[N : ρ /x : ρ ] is SC, then so is (λx : ρ. M : σ)(N : ρ ), provided
that N : ρ is SC when x : ρ 6∈ FV(M : σ).

Proof. (i)
Let σ ≡ σ1 → · · · → σn → θ with θ atomic and let M1 : σ1 , Mn : σn be SC
terms. Since (M : σ)[N : ρ /x : ρ ] is SC, it follows that

((M[N/x]) M1 . . . Mn ) : θ

is SN, and so are all its subterms.


Hence, M is SN by Lemma 8.13. Moreover, N is SN if it does not
occur in M[N/x] by hypothesis and Lemma 8.14. ,→

147 of 472
Normalizability VI
,→ Proof. (ii)
So, an infinite reduction of ((λx . M) N M1 . . . Mn ) : θ has the form

(λx . M) N M1 . . . Mn Bβ (λx . M 0 ) N 0 M10 . . . Mn0


B1β M 0 [N 0 /x] M10 . . . Mn0
Bβ . . .

where M Bβ M 0 , N Bβ N 0 . etc.
But, from M Bβ M 0 and N Bβ N 0 , we get that M[N/x] Bβ M 0 [N 0 /x],
hence, we can construct the following reduction:

(M[N/x] M1 . . . Mn ) Bβ M 0 [N 0 /x] M10 . . . Mn0 ,

which may continue forever, contradicting that the starting term is


SN. Hence, (λx : ρ. M : σ)(N : ρ ) must be SN.
148 of 472
Normalizability VII

Lemma 8.16
For every typed term M : τ:
1. M : τ is SC;
2. For all x1 : ρ 1 , . . . , xn : ρ n , with n ≥ 1, and all SC terms
N1 : ρ 1 , . . . , Nn : ρ n such that none of the x1 , . . . , xi −1 variables occurs
free in Ni , the term M ∗ ≡ M[N1 /x1 ] . . . [Nn /xn ] is SC.
Proof. (i)
(1) is an instance of (2), where Ni ≡ xi , since every xi is SC by
Lemma 8.14.
The proof of (2) is by induction on the structure of M:
■ M ≡ xi and τ ≡ ρ i . Then M ∗ ≡ Ni , which is SC by assumption.
,→
149 of 472
Normalizability VIII
,→ Proof. (ii)
■ M is an atom distinct from x1 , . . . , xn . Then M ∗ ≡ M which is SC by
Lemma 8.14.
■ M ≡ M1 M2 . Then M ∗ ≡ M1∗ M2∗ . By induction hypothesis, M1∗ and
M2∗ are SC, and so is M ∗ by Lemma 8.13.
■ M ≡ (λx : ρ. M1 : σ) and τ ≡ ρ → σ. By Lemma 8.14, we can safely
assume that x does not occur free in any N1 , . . . , Nn , x1 , . . . , xn .
Then, M ∗ ≡ λx . M1∗ . Let N : ρ be SC, then

M ∗ N ≡ (λx . M1∗ ) B1β M1∗ [N/x] ≡ M1 [N/x][N1 /x1 ] . . . [Nn /xn ]

which is SC by the induction hypothesis applied to M1 and the


sequence N , N1 , . . . Nn . Then M ∗ N is SC by Lemma 8.15. So, by
definition, M ∗ is SC.
150 of 472
Representability I

Since every term in λβ → has a β-nf, every function which can be


represented in this system, must be total.
Definition 8.17 (Extended polynomials)
An extended polynomial is a function Nk → N defined by induction:
■ the projections πi (x1 , . . . , xn ) = xi , the constant functions cm (x) = m,
the addition +(x , y ) = x + y and the multiplication ∗(x , y ) = xy are
extended polynomials, as ½ well as the functions sg : N → N and
1 if x ≥ 1
sg : N → N, where sg(x) = and sg(x) = 1 − sg(x).
0 otherwise
■ the composition f ◦ g of two extended polynomials f and g , is an
extended polynomial.

151 of 472
Representability II

Theorem 8.18
All extended polynomials are representable in λβ → and, vice versa,
the only representable functions Nk → N are extended polynomials.
[Proof not required]

[Exercise] Prove that all extended polynomials are representable.

152 of 472
References and Hints
This lecture is based on Chapter 10 of[HS].
The proof of strong normalizability of λβ → can be found in Appendix
3 of the same text.
Theorem 8.18 has been proved by H. Schwichtenberg in 1976: H.
Schwichtenberg, Definierbare Funktionen im λ-Kalkül mit Typen,
Archiv für Mathematische Logik, 17 (1976) 113-114. An exposition of
the result can be found also in A.S. Troelstra, H. Schwichtenberg,
Basic Proof Theory, 2nd edition, Cambridge University Press (2000).
ISBN: 0521779111
The techniques used to operate on reductions are more general and
not limited to λ-calculi. An account can be found in F. Baader and T.
Nipkow, Term Rewriting and All That, Cambridge University Press
(1999). ISBN: 0521779200
153 of 472
Fundamentals of Functional Programming
Lecture 9

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline
We have studied a Church-style typed system, λβ →. We have seen
that it is decidable, since it has the strong normalizability property,
and that its expressive power is limited to a class of computable
functions, the extended polynomials.

As a matter of fact, introducing types allows a better control of


functions, since we can limit the inputs and the outputs, thus
constraining the behaviour of our programs, but at the price of
limiting the expressive power.

In this lecture, we want to show an extended version of λβ →:


■ It differs from the previous one, being a Curry-style system;
■ It enhances the previous one, being a polymorphic type system.

155 of 472
Types

Definition 9.1 (Parametric simple types)


Let T be a set of symbols, called the atomic types, and let VT be a
denumerable set of symbols, called the type variables, disjoint from T ,
then τ is a type either if it is atomic, i.e., τ ∈ T ∪ VT , or if it is a
function type. i.e., τ ≡ (σ → ρ ) where σ and ρ are types.
A type is closed if it contains no type variables, and it is open if it
contains only type variables.

A type σ → ρ can be thought to represent a class of functions φ such


that x ∈ σ implies φ(x) is defined and φ(x) ∈ ρ . This is slightly
different from the interpretation in λβ →.

156 of 472
Terms and Formulae

A term is simply a pure λ-term. This makes Curry-style the system we


are defining. The “legal” terms will be the ones that can be typed.
Definition 9.2 (Formula)
A type-assignment formula is any expression X : τ where X is a term
and τ a type. The term X is referred to as the subject of the formula,
while τ is referred to as the predicate of the formula.
Informally, a statement X : τ is read as “X is assigned the type τ” or
also “X has type τ”.
Formulae represent the link between terms and types. Their validity
depends on a possibly empty set of assumptions, corresponding to
declarations in a programming language, and the validity is checked by
a formal system.

157 of 472
Type assignment system I

Definition 9.3 (The TA → type assignment system)


Given a set Γ of formulae whose subjects are variables, Γ ` M : τ holds
if one of the following rules applies:
■ M ≡ x and there is x : τ ∈ Γ (α-rule);
■ M ≡ P Q and if Γ ` P : σ → τ and Γ ` Q : σ then Γ ` P Q : τ (→ E
rule);
■ M ≡ λx . N, τ ≡ σ → ρ , and if Γ, z : σ ` N[z/x] : ρ , then
Γ ` (λx . N) : σ → ρ where z is a variable new in Γ and N (→ I rule).
When Γ is empty, we write ` M : τ.

158 of 472
Type assignment system II

Example 9.4
We want to prove that ` K : σ → τ → σ.
The derivation is almost immediate:
w : σ, z : τ ` w : σ
→I
w : σ ` λy . w : τ → σ
→I
` λx , y . x : σ → τ → σ

Notice how the (→ I ) rule forces to generate a large number of new


variables. In fact, one can slightly modify the definition of derivation
to take care of this aspect, at the price of introducing more complex
rules in the use of variables.

159 of 472
Basic properties I

Lemma 9.5 (α-invariance)


Let Γ ` M : τ be any derivation and let M ≡α N. Then Γ ` N : τ.

Proof.
By induction on the proof Γ ` M : τ:
■ if M ≡ x and x : τ ∈ Γ, then M ≡ N and the conclusion is obvious;
■ if Γ ` P : σ → τ, Γ ` Q : σ and M ≡ P Q, then N ≡ P 0 Q 0 and P ≡α P 0 ,
Q ≡α Q 0 , so, by induction hypothesis, Γ ` P 0 : σ → τ and Γ ` Q 0 : σ,
thus Γ ` N : τ by (→ E );
■ if Γ, z : σ ` P[z/x] : ρ and M ≡ (λx . P), τ ≡ σ → ρ , then N ≡ (λy . P 0 ),
P[z/x] ≡α P 0 [z/y ], with z new in P , P 0 , Γ. By induction hypothesis,
Γ, z : σ ` P 0 [z/y ] : ρ , thus Γ ` N : τ by (→ I ).

160 of 472
Basic properties II

Lemma 9.6 (Generation)


■ If Γ ` x : σ then x : σ ∈ Γ;
■ If Γ ` P Q : τ, then there is σ such that Γ ` P : σ → τ and Γ ` Q : σ;
■ If Γ1 ` λx . P : τ, then there are σ and ρ such that τ ≡ σ → ρ , and
Γ, z : σ ` P[z/x] : ρ for some variable z new in P and Γ.

Proof.
By induction on the derivation.

This result is useful to show that certain terms have no types.

[Exercise] Complete the details.

161 of 472
Basic properties III

Corollary 9.7 (Typability of subterms)


Let M 0 be a subterm of M and let Γ ` M : τ. Then Γ0 ` M 0 : σ for some
type σ and some set Γ0 of formulae.

Proof.
By induction on the structure of M. Each step in the induction
corresponds to an application of the generation lemma. Notice that
Γ ⊆ Γ0 .

[Exercise] Complete the details.

162 of 472
Basic properties IV

Lemma 9.8 (Substitution)


■ If Γ ` M : σ then Γ[τ/a] ` M : σ[τ/a] where a is a type variable;
■ If Γ, x : σ ` M : τ and Γ ` N : σ, then Γ ` M[N/x] : τ.

Proof.
(1) follows by induction on the derivation Γ ` M : σ.
(2) follows by induction on the generation of Γ, x : σ ` M : τ.

[Exercise] Complete the details.

163 of 472
Basic properties V

Lemma 9.9 (Context)


■ If Γ ⊆ ∆ and Γ ` M : τ, then ∆ ` M : τ;
If Γ ` M : τ then FV(M) ⊆ x : there is σ such that (x : σ) ∈ Γ .
© ª

If Γ ` M : τ and ∆ = (x : σ) : (x : σ) ∈ Γ and x ∈ FV(M) , then


© ª

∆ ` M : τ.

Proof.
All statements are proved by induction on the generation of
Γ ` M : τ.

[Exercise] Complete the details.

164 of 472
Basic properties VI
Theorem 9.10 (Subject reduction)
If Γ ` M : σ and M Bβ M 0 , then Γ ` M 0 : σ.

Proof.
By induction on the length of the reduction, it suffices to prove the
one-step case. If M ≡α M 0 , then the result is evident by the
α-invariance lemma.
So, let M ≡ U[((λx . P) Q)/z] and M 0 ≡ U[(P[Q/x])/z] with z
occurring only once in U. By the generation lemma, it suffices to
prove the case Γ ` (λx . P) Q : τ implies Γ ` P[Q/x] : τ.
But, if Γ ` (λx . P) Q : τ, by the generation lemma, Γ ` λx . P : σ → τ
and Γ ` Q : σ. Applying the generation lemma again, we get
Γ, z : σ ` P[z/x] : τ where z is new. Thus, by the substitution lemma,
Γ ` P[z/x][Q/z] : τ. Since z is new in P, P[z/x][Q/z] ≡ P[Q/x].
165 of 472
Basic properties VII

Notice that terms having a type are not closed under expansion, in
fact ` S K : (τ → σ) → (τ → τ), ` λx , y . y : τ → (σ → σ) and
S K Bβ λx , y . y but, evidently, S K 6` τ → (σ → σ).

[Exercise] Perform the calculations.

166 of 472
Comparison with λβ → I

Definition 9.11 (Forgetful map)


Let ΛT be the collection of typed λ-terms and let Λ be the collection
of pure λ-terms. The forgetful map | · | : ΛT → Λ is defined as follows:
■ |x : σ| = x;
■ |(M : σ → τ) (N : σ)| = M N;
■ |(λx : σ. M : τ) : σ → τ| = λx . M.

The forgetful map, as the name suggests, forgets the type decorations
of a typed term.

167 of 472
Comparison with λβ → II

Theorem 9.12
Let M : τ ∈ ΛT and let Γ = C ∪ (x : σ) : (x : σ) ∈ V and x ∈ FV(M) ,
© ª

where C and V are the typed constants and variables, respectively.


Then Γ ` |M : τ| : τ.
■ If Γ ` M : τ then there is N ∈ ΛT such that |N : τ| ≡ M and N : τ in
λβ →, where the constant and the variables correspond to those in
Γ.
Proof.
By induction on the given derivations.

[Exercise] Complete the details.

168 of 472
Strong normalizability

Theorem 9.13 (Strong normalizability)


If Γ ` M : τ, then M is SN.

Proof.
Since Γ ` M : τ, we know that there is a term N in λβ → such that
|N : τ| ≡ M and N : τ. Thus, by the strong normalizability theorem in
λβ →, |N : τ| ≡ M is SN.

169 of 472
References and Hints

The presented material is covered by Chapter 12 of [HS].

Beware that Definition 12.6 in that textbook is wrong. In fact, the


(→ I ) rule should be changed as indicated in these slides, or the
α-invariance result would be unprovable!

The presentation follows [Bar2].

170 of 472
Fundamentals of Functional Programming
Lecture 10

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline
In the previous lectures, the simple theory of types has been introduced
both in the Church-style and in the Curry-style. In the second case,
type variables were present, although their use has been very limited.

Most functional programming language are based on extensions of the


simple theory of types. Extensions are on the type system, where
additional constructions are introduced, on the term system, where
interaction with types is admitted to some extent, hence implementing
real polymorphism, and finally, on a richer set of basic terms, to
exploit computer potential of doing arithmetic on integers and floating
point numbers, to have pointers and state variables, and so on.

In this lecture, we want to introduce a family of type system, i.e.,


typed λ-calculi, which are the ones on which most functional
languages are based on.
172 of 472
Church-typing and polymorphism
The system we will introduce in this lesson are Church-style typed
λ-calculi. They allow types to depend on terms, and terms to depend
on types.

The idea of a term depending on a type and vice versa is quite easy to
understand: a function f : A → B maps all the elements in the set A to
some elements in the set B. Sometimes, we are interested in
considering the image of f , i.e., the set f (A), as a type. In this case,
we want to write functions like g ≡ (λx ∈ f (A). . . . ). Evidently the type
f (A) depends on the term A.

The systems we are about to introduce allow to mix terms and types
in a controlled way. So, a term may have different types, which are
related to each other, and a type assumes a deeper meaning than
mere classification.
173 of 472
The λ-cube I
Definition 10.1 (Pseudoterms)
Assume to have a denumerable set of variables V and a set of type
constants C . Then pseudoterms are inductively defined as follows:
■ Every variable x ∈ V is a pseudoterm, and FV(x) = { x };
■ Every constant c ∈ C is a pseudoterm, and FV(c) = ;;
■ If M and N are pseudoterms, then so is (M N), and
FV(M N) = FV(M) ∪ FV(N);
■ If x ∈ V and M and N are pseudoterms, then so is (λx : M . N), and
FV(λx : M . N) = (FV(M) ∪ FV(N)) \ { x };
■ If M and N are pseudoterms and x ∈ V is such that x 6∈ FV(M),
then (Πx : M . N) is a pseudoterm, and
FV(Πx : M . N) = (FV(M) ∪ FV(N)) \ { x }.

174 of 472
The λ-cube II

A pseudoterm represent a term or a type. It becomes a valid term or


type when it can be assigned a type by the system we are working in.

Definition 10.2 (Reduction)


Pseudoterms may β-reduce, following the rule:

(λx : A. M) N B M[N/x] .

The system we will consider will have two special constants ∗ and ,
called sorts.

175 of 472
The λ-cube III

The systems we will consider define a notion of derivation, notation


Γ ` A : B, from a context Γ to a pair of pseudoterms A and B.

Definition 10.3 (Context)


Given a notion of derivation, a context is a finite sequence
Γ = x1 : A1 , . . . , xn : An such that, for all i,
■ xi ∈ V ;
■ Ai is a pseudoterm;
■ xi 6∈ FV(Aj ) for j ≤ i;
■ either Ai is a sort, or x1 : A1 , . . . , xi −1 : Ai −1 ` Ai : s with s a sort.

176 of 472
The λ-cube IV
Definition 10.4 (The λ-cube)
The eight systems in the λ-cube are formed according to the following
derivation rules:
■ (axiom):
` ∗:
■ (start) x 6∈ FV(Γ) ∪ FV(A) and s is a sort:
Γ ` A:s
Γ, x : A ` x : A

■ (weakening) x 6∈ FV(Γ) ∪ FV(A) and s is a sort:


Γ ` M :B Γ ` A:s
Γ, x : A ` M : B
,→
177 of 472
The λ-cube V
,→ (The λ-cube)
■ (application): Γ ` M : (Πx : A. B) Γ ` N :A
Γ ` M N : B[N/x]
■ (abstraction) x 6∈ FV(Γ) ∪ FV(A) and s is a sort:
Γ, x : A ` M : B Γ ` (Πx : A. B) : s
Γ ` (λx : A. M) : (Πx : A. B)

■ (product) x 6∈ FV(Γ) ∪ FV(A) and s1 , s2 sorts and (s1 , s2 ) ∈ R :


Γ ` A : s1 Γ, x : A ` B : s2
Γ ` (Πx : A. B) : s2
,→

178 of 472
The λ-cube VI
,→ (The λ-cube)
■ (conversion) s is a sort:

Γ ` M :A A =β B Γ ` B :s
Γ ` M :B
■ (α-conversion):
Γ ` M :A M ≡α N
Γ ` N :A
A pseudoterm M is a term iff there are a context Γ and a pseudoterm
A such that Γ ` M : A. A pseudoterm A is a type iff there are a context
Γ and a pseudoterm M such that Γ ` M : A.
,→

179 of 472
The λ-cube VII
,→ (The λ-cube)
The product rule characterises the systems in the λ-cube. Specifically,
the eight systems are the following:

System R
λ→ (∗, ∗)
λ2 (∗, ∗) (, ∗)
λP (∗, ∗) (∗, )
λP2 (∗, ∗) (, ∗) (∗, )
λω (∗, ∗) (, )
λω (∗, ∗) (, ∗) (, )
λP ω (∗, ∗) (∗, ) (, )
λP ω = λC (∗, ∗) (, ∗) (∗, ) (, )

180 of 472
The λ-cube VIII
The systems in the λ-cube are usually represented as:
λω λC
|| xx
||| xxx
| x
|| xx
λ2 λP2

λω λP ω
||| xxx
| x
|| xx
|| xx
λ→ λP
(,O ∗)

( , )
u:
uu
uu
uu
where directions have the meaning uuu
u / (∗, )

181 of 472
The λ-cube IX

Definition 10.5 (Arrow type)


The arrow type A → B is defined to be (Πx : A. B) where x 6∈ FV(B).

With this definition, it is easy to see that λ → is λβ → with type


variables, thus it is equivalent to the Curry-style system TA →.
Definition 10.6 (Falsum)
⊥ ≡ (Πu : ∗. u).

In λ2 one can derive


■ ` (λu : ∗, x : u . x) : (Πu : ∗. u → u) which is identity parametrised by
the type u;
■ ` (λu : ∗, x : ⊥. x u) : (Πu : ∗. ⊥ → u) which is “ex falso quodlibet”.

182 of 472
The λ-cube X
The system λω allows polymorphic recursive types. For example
u : ∗ ` (λf : (∗ → ∗). f (f u)) : (∗ → ∗) → ∗: the term
(λf : (∗ → ∗). f (f u)) takes a function f , regarded as a type
constructor, and produces a type as its output.

The system λP allows to define types depending on terms. It has been


used to code the first logical frameworks, computer systems able to
represent generic mathematical theories and their derivations.

The system λω allows to define an internal notion of logical


conjunction: interpreting → as implication, and calling
and ≡ λu : ∗, v : ∗. Πw : ∗. (u → v → w ) → w , one can derive the usual
rules for logical conjunction as and x y → x, and x y → y and
x , y ` and x y , assuming that x : ∗, y : ∗.
183 of 472
The λ-cube XI
The system λP2 is related to second-order intuitionistic predicate
logic.

The system λP ω corresponds to a weak version of higher-order


intuitionistic logic and has been discovered “by symmetry” in the
λ-cube.

The system λC is the “calculus of constructions”, the basis for Coq,


one of the most powerful proof assistants. A proof assistant is a
computer system which allows mathematicians and computer
scientists to develop formal proofs and programs, and, to some extent,
to write very abstract programming system for special purposes, e.g.,
verifying the correctness of critical systems, like nuclear power-plants
or avionics.
184 of 472
Elementary properties I

Lemma 10.7 (Free variables)


If x1 : A1 , . . . , xn : An ` M : A then FV(A) ∪ FV(M) ⊆ { x1 , . . . , xn }.

Proof.
Induction on the derivation.

Lemma 10.8 (Transitivity)


If x1 : A1 , . . . , xn : An ` M : A and Γ ` xi : Ai for all i, then Γ ` M : A.

Proof.
Induction on the derivation x1 : A1 , . . . , xn : An ` M : A.

[Exercise] Complete the details.


185 of 472
Elementary properties II

Lemma 10.9 (Substitution)


If Γ, x : A, ∆ ` M : B and Γ ` N : A, then Γ, ∆[N/x] ` M[N/x] : B[N/x].

Proof.
By induction on the derivation of Γ, x : A, ∆ ` M : B.

Lemma 10.10 (Thinning)


If Γ ⊆ ∆ with ∆ a context, and Γ ` M : A, then ∆ ` M : A.

Proof.
By induction on the derivation Γ ` M : A.

[Exercise] Complete the details.


186 of 472
Elementary properties III
Lemma 10.11 (Generation)
■ If Γ ` s : A, with s a sort, then s ≡ ∗ and A =β ;
■ If Γ ` x : A, with x a variable, then there is a sort s and a B =β A
such that x : B ∈ Γ, Γ ` B : s;
■ If Γ ` (Πx : A. B) : C , then there are sorts s1 and s2 such that
(s1 , s2 ) ∈ R and Γ ` A : s1 , Γ, x : A ` B : s2 and C =β s2 ;
■ If Γ ` (λx : A. M) : C , then there are a sort s and term B such that
Γ ` (Πx : A. B) : s, Γ, x : A ` M : B and C =β (Πx : A. B);
■ If Γ ` M N, then there are A and B such that Γ ` M : (Πx : A. B),
Γ ` N : A and C =β B[N/x].

Proof.
Induction on the derivation of the main term.
187 of 472
Elementary properties IV

Lemma 10.12 (Subterm)


If A is a term (type), and B is a subterm of A, then B is a term (type).

Proof.
Suppose A is a term, then Γ ` A : M for some Γ and M. By induction
on Γ ` A : M, via the generation lemma, one proves that B 0 =β B
appears as the subject of a derived term.

188 of 472
Elementary properties V

Theorem 10.13 (Subject-reduction)


If Γ ` M : A and M Bβ M 0 , then Γ ` M 0 : A.
[Proof not required]

Corollary 10.14
If Γ ` M : A and A Bβ A0 , then Γ ` M : A0 .

Proof.
By induction on the derivation Γ ` M : A, one proves that A ≡ s or
Γ ` A : s for some sort s. In the first case, we are done; in the second
case, apply the subject-reduction theorem to obtain Γ ` A0 : s, thus an
application of the conversion rule proves the corollary.

189 of 472
Elementary properties VI

Lemma 10.15 (Unicity of types)


If Γ ` M : A and Γ ` M : A0 , then A =β A0

Proof.
By induction on the structure of A.

190 of 472
Elementary properties VII
Theorem 10.16 (Strong normalization)
If x1 : A1 , . . . , xn : An ` M : B, then A1 , . . . , An , M and B are SN.
[Proof not required]

The proof is a major result. It suffices to prove the result for λC , but
SN of λC reduces to SN in λω, modulo a suitable translation of types
and terms. Nevertheless, proving SN of λω is still not elementary.

Corollary 10.17
In the λ-cube, type checking (Is Γ ` M : A correct?) and typability
(Find A and Γ such that Γ ` M : A) are decidable.

Notice that checking if a type S is inhabited (Find M and Γ such that


Γ ` M : A), is NOT decidable in λC , even if it is for λP.
191 of 472
References and Hints

The content of this lesson comes from Chapter 13E of [HS].

The proofs, also the omitted ones, can be found in [Bar2].

192 of 472
Fundamentals of Functional Programming
Lecture 11

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

Till now, we have interpreted the λ-calculi as programming systems, or


as type systems, i.e., mathematical theories based on a type algebra
and a notion of reduction.

This view is significant since it allows to develop the fundamental


ideas of functional programming, providing at the same time a solid
and rigorous foundation.

In this lesson, we want to introduce a different interpretation, where


types are seen as logical formulae and terms as their proofs.

The ultimate goal is to understand the “representation power” of the


typed systems we have introduced so far.

194 of 472
Universal quantification I
Definition 11.1 (Universal quantifier)
The universal quantifier ∀ is an abbreviation for the product (Π)
operator.

In Logic, ∀x : A. B is a formula if A is a logical sort and B is a formula.


We prefer to use the word “type” instead of “logical sort”, and we
assume that every λ-term M : ∗ is a formula.
In this way, the rule which governs the formation of universally
quantified formulae is just the product rule in the λ-cube
Γ, x : A ` B : ∗ Γ ` A:s
Γ ` (∀x : A. B) : ∗

195 of 472
Universal quantification II
A universally quantified formula obeys two inference rules, one to
introduce the quantification, the other to eliminate it:

Γ, p : A ` B Γ ` ∀x : A. B
∀I ∀E
Γ ` ∀x : A. B Γ ` B[t/x]

where t is a term of type A and p 6∈ FV (Γ ∪ { A }), i.e., p is an


eigenvariable.
If we compare the ∀I rule with the abstraction rule in the λ-cube,

Γ, x : A ` M : B Γ ` (Πx : A. B) : ∗
Γ ` (λx : A. M) : (Πx : A. B)

we see that the logical rule is the same thing as the type rule, as far as
we check that the conclusion is, indeed, a formula.
196 of 472
Universal quantification III
In the previous rule, the term λx : A. M encodes the proof of
Πx : A. B ≡ ∀x : A. B, where M is the proof of B assuming x : A. Note
how abstraction in the proof-term takes care of modelling
eigenvariables in the proof.

We have to check that B is a formula, but it does not suffice, since


the type A can be incompatible with B, if the logical system is not
powerful enough. Thus, the product rule, controlling the formation of
universal formulae, allows for first-order or higher-order logics.

Similarly, the ∀E rule is just the application rule in the λ-cube:

Γ ` M : (Πx : A. B) Γ ` t :A
Γ ` M t : B[t/x]

where we have to prove that t has type A.


197 of 472
Implication I
Definition 11.2 (Implication)
The logical implication, denoted by ⊃, corresponds to the type
constructor →. It is defined as A ⊃ B ≡ A → B ≡ F A B, where

F ≡ λu : ∗, v : ∗. (Πx : u . v ) .

The formation rule for implications says that, if A and B are formulae,
then so is A ⊃ B.

Implication behaviour is described by its inference rules:


Γ, A ` B Γ`A⊃B Γ`A
⊃I ⊃E
Γ`A⊃B Γ`B

198 of 472
Implication II
Lemma 11.3
In the λ-cube it holds that
Γ ` A:∗ Γ ` B :∗
Γ ` A → B :∗
Proof. (i)
We have to prove that Γ ` F A B : ∗. By a double use of the
application rule and hypotheses, it reduces to Γ ` F : (Πy : ∗, z : ∗. ∗).
By the thinning lemma, it suffices to prove
` (λu : ∗, v : ∗. (Πx : u . v )) : (Πy : ∗, z : ∗. ∗).
But, applying twice the abstraction rule and the thinning lemma, this
goal reduces to prove u : ∗, v : ∗ ` (Πx : u . v ) : ∗, ` (Πz : ∗. ∗) : ∗ and
` (Πy : ∗, z : ∗. ∗) : ∗. Applying the product rule to the first goal, we see
immediately that it holds. ,→
199 of 472
Implication III
,→ Proof. (ii)
The second goal reduces to the axiom, after the application of the
product rule. The third goal reduces to the second goal after thinning
the application of the product rule.
Lemma 11.4
The following rules hold in the λ-cube, when x 6∈ FV(Γ ∪ { B }):
Γ ` M :A → B Γ ` N :A Γ, x : A ` M : B Γ ` A → B :∗

Γ ` M N :B Γ ` (λx : A. M) : A → B
Proof.
Standard. (These proofs, as the previous one, are essentially
mechanical and we will not develop them in the future).
The first lemma corresponds to the formation rule for implication,
while the second lemma derives the ⊃ I and ⊃ E rules.
200 of 472
Conjunction I
Definition 11.5 (Conjunction)
Let
■ ∧ ≡ λu : ∗, v : ∗. (Πw : ∗. (u → v → w ) → w );
■ D ≡ λu : ∗, v : ∗, x : u , y : v , w : ∗, z : (u → v → w ). z x y ;
■ fst ≡ λu : ∗, v : ∗, x : (∧ u v ). x u (λy : u , z : v . y );
■ snd ≡ λu : ∗, v : ∗, x : (∧ u v ). x v (λy : u , z : v . z).
We write A ∧ B for ∧ A B when A and B are intended as logical
formulae, and A × B when A and B are intended as types.

The ∧ combinator represents the logical conjunction, while D, fst and


snd are the corresponding proof-terms. They model the idea of pairs
of proofs along with the pair projectors ([Exercise] compare these
definitions with the datatypes of pairs, i.e., 2-tuples).
201 of 472
Conjunction II
Lemma 11.6
In λ2 and stronger systems, it holds that
■ u : ∗, v : ∗ ` u ∧ v : ∗;
■ u : ∗, v : ∗ ` D u v : u → v → u ∧ v ;
■ u : ∗, v : ∗ ` fst u v : u ∧ v → u;
■ u : ∗, v : ∗ ` snd u v : u ∧ v → v .
Proof.
Standard.

The first statement is the formation rule for conjunction: if A and B


are formulae, so is A ∧ B. The second statement is the introduction
rule:
Γ`A Γ`B
∧I
Γ ` A∧B
202 of 472
Conjunction III
The third and the fourth statements are the elimination rules:
Γ ` A∧B Γ ` A∧B
∧E1 ∧E2
Γ`A Γ`B
Lemma 11.7
In λ2 and stronger systems, it holds that
■ u : ∗, v : ∗x : u , y : v ` fst u v (D u v x y ) : u;
■ u : ∗, v : ∗x : u , y : v ` snd u v (D u v x y ) : v ;
■ fst u v (D u v x y ) =β x and snd u v (D u v x y ) =β y .
Proof.
Standard.
The first and the second statement show that introducing and
eliminating a conjunction is a redundant detour. The third statement
tells us that the detour is eliminated by reducing the proof term.
203 of 472
Disjunction I
Definition 11.8 (Disjunction)
Let
■ ∨ ≡ λu : ∗, v : ∗. (Πw : ∗. (u → w ) → ((v → w ) → w ));
■ inl ≡ λu : ∗, v : ∗, x : u , w : ∗, f : u → w , g : v → w . f x;
■ inr ≡ λu : ∗, v : ∗, y : u , w : ∗, f : u → w , g : v → w . g y ;
■ case ≡ λu : ∗, v : ∗, z : (∨ u v ), f : u → w , g : v → w . z w f g .
We write A ∨ B as an abbreviation for ∨ A B; when dealing with types,
we also write A + B instead of A ∨ B.

The ∨ combinator represents the logical disjunction, while inl, inr and
case are the corresponding proof-terms. They model the datatype
representing the disjoint union of pairs of proofs, along with the
standard injections.
204 of 472
Disjunction II

Lemma 11.9
In λ2 and stronger systems, the following facts are true:
■ u : ∗, v : ∗ ` inl u v : u → u ∨ v ;
■ u : ∗, v : ∗ ` inr u v : v → u ∨ v ;
■ u : ∗, v : ∗ ` case u v : u ∨ v → (Πw : ∗. (u → w ) → ((v → w ) → w )).
Proof.
Standard.

Thus, inl and inr represent the proofs of disjunction introduction:


Γ`A Γ`B
∨Il ∨Ir
Γ ` A∨B Γ ` A∨B

205 of 472
Disjunction III
Similarly, case represents the proof of disjunction elimination:
Γ ` A∨B Γ, x : A ` C Γ, x : B ` C
∨E
Γ`C

where, as usual x 6∈ FV(Γ ∪ { A, B , C }).

As before, introducing and immediately eliminating a disjunction is a


detour, since a more compact proof can be obtained by combining the
left (right) introduction rule with the second (third, respectively)
assumption in the ∨E rule.

The following lemma shows that these detours are correct and they
are eliminated by reducing the proof-terms.

206 of 472
Disjunction IV

Lemma 11.10
The following facts are true in λ2 and stronger systems:
■ u : ∗, v : ∗, x : u , y : v , f : u → w , g : v → w ` case u v (inl u v x) w f g : w ;
■ u : ∗, v : ∗, x : u , y : v , f : u → w , g : v → w ` case u v (inr u v y ) w f g : w ;
■ case u v (inl u v x) w f g =β f x;
■ case u v (inr u v y ) w f g =β g y .
Proof.
Standard.

207 of 472
Falsity I

Definition 11.11 (Falsity)


The void type is defined as void ≡ Πu : ∗. u. We write ⊥ for void when
dealing with the logical interpretation.

Lemma 11.12
In every system in the λ-cube,
■ ` ⊥ : ∗;
■ x : ⊥, u : ∗ ` x u : u.
Proof.
Standard.

As usual, the first statement is a formation rule: ⊥ is a formula.


208 of 472
Falsity II
The second statement corresponds to the ⊥E inference rule:
Γ`⊥
⊥E
Γ`A
for any formula A.

Theorem 11.13 (Consistency)


In the λ-cube there is no closed term M such that ` M : ⊥.

Proof. (i)
Suppose there is such an M. Then, by the application rule, we can
derive u : ∗ ` M u : u and, since the systems in the λ-cube are strong
normalising, M u =β N for some N in β-nf. ,→

209 of 472
Falsity III

,→ Proof. (ii)
By the generation lemma, N cannot be an abstraction, so
N ≡ a N1 . . . Nn , where a is a constant or a variable,
u : ∗ ` a : (Πy1 : A1 , . . . , yn : An . ∗) and, for each i, u : ∗ ` Ni : Ai .
If a is a constant then a : (Πy1 : A1 , . . . , yn : An . ∗) must be an axiom,
which is impossible. Otherwise, if a is a variable, then a ≡ u and n = 0,
but the only possible type for u is ∗, not u.

This result shows that all the systems in the λ-cube are logically
consistent, i.e., they cannot prove the false proposition. This is the
syntactical counterpart of soundness, which says that any provable
proposition is true.

210 of 472
Existential quantification I
Definition 11.14 (Existential quantifier)
Let
■ Σ ≡ λu : ∗, v : u → ∗. (Πw : ∗. (Πx : u . v x → w ) → w );
■ D 0 ≡ λu : ∗, v : u → ∗, x : u , y : v x , w : ∗, z : (Πx : u . v x → w ). z x y ;
■ proj ≡ λu : ∗, v : u → ∗, w : ∗, z : (Πx : u . v x → w ), y : (Πx : u . v x). y w z.
We write ∃x : A. B as an abbreviation for Σ A (λx : A. B).

In Logic, ∃x : A. B is a formula if A is a type and B is a formula. The


behaviour of the ∃ quantifier is fixed by the following inference rules:

Γ ` B(t) Γ ` ∃x : A. B(x) Γ, B(p) ` C


∃I ∃E
Γ ` ∃x : A. B(x) Γ`C

where t is a term of type A and p 6∈ FV(Γ ∪ { C }).


211 of 472
Existential quantification II

Lemma 11.15
In λ2 and stronger systems it holds that
■ u : ∗, v : u → ∗ ` (∃x : u . v x) : ∗;
■ u : ∗, v : u → ∗ ` D 0 u v : (∀t : u . v t ⊃ (∃x : u . v x));
■ u : ∗, v :u → ∗ ` proju v :(Πw : ∗. (∀y :u . v y → w ) ⊃ ((∃x :u . v x) ⊃ w )).
Proof.
Standard.

As usual, the statements encode the formation rule and the two
inference rules, as it is immediate to see.

212 of 472
Existential quantification III

Lemma 11.16
In λ2 and stronger systems, it holds that
■ u : ∗, v : u → ∗, w : ∗, x : u , y : v x , z : Πx : u . v x → w `
proj u v w x (D 0 u v x y ) : w ;
■ proj u v w x (D 0 u v x y ) =β z x y .
Proof.
Standard.

This lemma shows the correctness of the detour of introducing and


immediately eliminating an existential quantification. It also shows
that reduction of proof-terms takes care of eliminating the detour.

213 of 472
Equality I
Definition 11.17 (Equality)
The typed equality M =A N is defined to be Q A M N where

Q ≡ λu : ∗, x : u , y : u . (Πz : u → ∗. z x ⊃ z y ) .

This definition makes sense when A : ∗ has already been proved or


assumed in the context.

In Logic, M = N is a formula if M and N are both terms of type A.


The behaviour of equality is controlled by the rules:

Γ`t =s Γ ` B(t)
refl subst
`x =x Γ ` B(s)

214 of 472
Equality II

Lemma 11.18
In λ2 and stronger systems, it holds that
■ ` Q : Πu : ∗. u → u → ∗;
■ u : ∗, x : u ` (λz : u → ∗, w : z x . w ) : (x =u x);
■ u : ∗, x : u , y : u , m : (x =u y ), z : u → ∗, n : z x ` m z n : z y .
Proof.
Standard.

The first statement encodes the formation rule, while the second and
the third statements are the formalisation of the refl and subst
inference rules, respectively.

215 of 472
Expressivity I
The expressive power of the systems in the λ-cube is given by the
expressive power of the corresponding logical system. The logical
systems are intuitionistic, and precisely

System propositional ∀ → fragment only order


p p
λ→ I
p
λ2 × II
p
λP × I
λP2 × × II
p p
λω weak H
p
λω × H
p
λP ω × weak H
λP ω = λC × × H

216 of 472
Expressivity II

Since in λ2 and stronger systems, it is possible to represent Peano


arithmetic, it follow that those systems cannot be logically complete
because the Gödel’s Incompleteness Theorem holds in them.

In λ2 and stronger systems, for the same argument, all the partial
recursive functions can be represented. On the other hand, since these
systems are subsystems of the pure λ-calculus, no other functions can
be represented. Thus, λ2 and stronger systems are Turing-complete.

217 of 472
References and Hints

The content of this lesson has been taken from [HS], Chapter 13G.

The detours in logical proofs and their elimination is the base to


construct normalisation proofs in Logic. This topic will not be
analysed any further in this course, but the interested reader may find
an in-depth treatment in A.S. Troelstra, H. Schwichtenberg, Basic
Proof Theory, 2nd edition, Cambridge University Press (2000). ISBN:
0521779111

218 of 472
Fundamentals of Functional Programming
Lecture 12 — Intermezzo

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

This lesson, which concludes the first part of this course, wants to
introduce exceptions and their implementation in a functional
language.
Exceptions, as in Java, are used in a variety of ways, although they are
mainly employed for error reporting. This highly procedural feature
can be smoothly incorporated in a functional language by means of a
proper evaluation semantics.
What is surprising is that the semantics has a logical counterpart,
relating classical proofs with intuitionistic ones, in a way which is
“natural” with respect to the formulae-as-types interpretation of typed
programs.

220 of 472
Exceptions I

Exceptions are a common way to treat special cases in a computation.


In most functional languages they are implemented via two keywords:
catch and throw.
The construction

catch j in M;

operates by evaluating M in the usual way. If the construction

throw j N

is a subterm of N, when (and if) it gets evaluated, the value of the


whole expression “catch j in . . . ” becomes the value of N.

221 of 472
Exceptions II

To model exceptions, we start with the simply typed λ-calculus and we


extend it with a new term constructor.
Then, we interpret the so obtained calculus as a computational
structure by giving a suitable notion of reduction, which extends the
usual β-reduction taking care of the new constructor.

Our aim is to show that there is a “natural” interpretation of the


calculus in Logic, establishing a variant of the formulae-as-types
interpretation.

222 of 472
Syntax I

Definition 12.1 (Types)


Given a set C of atomic types, the set of types, T , is inductively
defined as the smallest set such that
■ ⊥ ∈ T and ⊥ 6∈ C ;
■ C ⊆T;
■ if t1 , t2 ∈ T then t1 → t2 ∈ T .
As usual, we abbreviate α → ⊥ in ¬α.

The set of types is the same as for the simply typed λ-calculus except
for the requirement of a special atomic type, ⊥.

223 of 472
Syntax II

Definition 12.2 (Terms)


Given a set of types T and a set V of variables such that there is a
denumerable quantity of them for each type in T , the set of terms,
ΛC , is inductively defined as the smallest set such that
■ x : τ ∈ V implies x ∈ ΛC and x : τ and FV(x) = { x };
■ if M , N ∈ ΛC and M : α → β and N : α, then (M · N) ∈ ΛC and
(M · N) : β and FV(M · N) = FV(M) ∪ FV(N);
■ if M ∈ ΛC , M : β and x : α ∈ V , then (λx : α. M) ∈ ΛC ,
(λx : α. M) : α → β and FV(λx : α. M) = FV(M) \ { x };
■ if M ∈ ΛC and M : ¬¬α, then C (M) ∈ ΛC , C (M) : α and
FV(C (M)) = FV(M).
We say that a term t is a value if t is a variable or a λ-abstraction.
224 of 472
Syntax III

The operational meaning of the C term constructor is: whenever the


subterm C (M) is reduced in the term E [C (M)], the whole term
reduces to M applied to the procedural abstraction of the initial
context.
Formally,
E [C (M)] B M (λz . A (E [z])) ,
where z 6∈ FV(E [M]) and

A (N) ≡ C (λd . N) ,

with d 6∈ FV(N).

225 of 472
Syntax IV

Since E [A (M)] B (λd . M)(λz . A (E [z])) B M, the evaluation of A (M)


in a context E throws away the current context and continues by
evaluating its argument M. So, A aborts the current context E and
continues by evaluating its argument M.

Similarly, C (M) throws away the current context E , but it can be


recovered if the term M is a λ-abstraction: in this case the abstracted
variable x is instantiated by β-reduction to the context deprived from
the redex; applying x to some term t evaluates to the current context
applied to the term t. So C allows to control the current context by
passing it as a parameter to its argument M.

226 of 472
Semantics I

Definition 12.3 (Evaluation context)


Fixed a set of terms, the evaluation context E is an object satisfying
one of the following clauses
■ E ≡ [];
■ E ≡ E 0 N where N ∈ ΛC and E 0 is an evaluation context;
■ E ≡ V e 0 where V is a value and E 0 an evaluation context.
If E is an evaluation context and M a term, then E [M] denotes the
term that results replacing M in the hole [] of E .

Intuitively, an evaluation context is useful to identify the position of


the β-redex we want to reduce. Because of its definition, an evaluation
context always identifies the outermost leftmost redex with its hole.
227 of 472
Semantics II

Lemma 12.4
Any closed term M is either a value or it can be written in a unique
way as M = E [R] where R is a β-redex.

Proof.
Choose R as the outermost-leftmost redex of M.

This lemma shows that defining reductions in terms of evaluation


contexts forces to follow the leftmost reduction strategy. Moreover,
the definition of value and its interaction with the definition of
evaluation context forces the call-by-value paradigm.

228 of 472
Semantics III

Definition 12.5 (C -reduction)


The reduction relation is defined as the reflexive and transitive closure
of the following rules:
■ C (λk . E [(λx . M) V ]) B C (λk . E [M[V /x]]), where
k 6∈ FV(E [(λx . M) V ]) (β-reduction);
■ C (λk . E [C (M)]) B C (λk . M (λz . A (E [z]))), where
k , z 6∈ FV(E [C (M)]) (C -reduction);
■ C (λk . k V ) B V , where V is a value and k 6∈ FV(V ) (cleanup).
Instead of evaluating an expression M : α, we start by evaluating
C (λk : ¬α. k M).

229 of 472
Semantics IV
This reduction relation allows to evaluate in a call-by-value every term,
even if containing a C constructor, yielding a value.

If the term M does not contain the C operator, it reduces in the usual
way, following β-reductions until it reaches a value: in this case the
cleanup rule is applied, yielding the result.

If the term M contains an instance of the C operator and it reduces


via the C -reduction, then the surrounding context is thrown away and
is passed to the argument as a parameter, as previously explained.

The initial context of M, I ≡ C (λk . k []), is used to “protect” the


evaluation of M, giving a reference context where every exception is
captured.

230 of 472
Catch and throw I

The catch/throw mechanism that implements exceptions is modelled


as follows:
■ the expression
catch j in M
is equivalent to the term C (λj . j M);
■ the expression
throw j N
is equivalent to the term j N.

231 of 472
Catch and throw II
If there is no throw expression j N in the scope of a catch, the
corresponding term reduces as follows:

E [C (λj . j M)] B (λj . j M)(λz . A (E [z])) B A (E [M])) B E [M] ,

as expected.
If there is a throw j Nexpression, it reduces to N, as required.

For example, consider the program

P ≡ λx . catch e in
if x = 0 then throw e “error”
else 1000/x;

Then, P 0 B "error" and P 2 B 500.


232 of 472
Logical interpretation I
Consider the implicational fragment of classical propositional logic:
Γ`A→B Γ`A Γ, A ` B Γ ` ¬¬A
→E →I ⊥C
Γ`B Γ`A→B Γ`A
The corresponding translation in the λ-calculus is defined as follows:
■ an assumption A is mapped into the term x : A;

■ if the proof Γ ` A → B corresponds to the term M : A → B and

Γ ` A corresponds to N : A, then an application of the → E rule is


mapped into (M N) : B;
■ if Γ, A ` B corresponds to M : B where the assumption A

corresponds to x : A, then an application of the → I rule is mapped


into (λx : A. M) : A → B;
■ if Γ ` ¬¬A corresponds to M : ¬¬A, then an application of the ⊥
C
rule is mapped into C (M) : A.
233 of 472
Logical interpretation II

In this way, to every proof in the implicational fragment of the classical


propositional logic is assigned a corresponding typed λ-term M : α.
Differently from the corresponding intuitionistic fragment, the output
of the computation may not be of type α. This happens exactly when
the classical rule is used, or, in computational terms, if the C operator
is evaluated.

What we want to show is that it is possible embed the considered


classical fragment into the corresponding intuitionistic one and,
moreover, the embedding preserves the computational content.

234 of 472
Logical interpretation III

Definition 12.6 (CPS-translation)


Given a term t, its CPS-transformation is inductively defined as:
■ x = λk . k x;
■ λx . M = λk . k (λx . M);
■ M N = λk . M (λm. N (λn. m n k));
■ C (M) = λk . M (λm. m (λz , d . k z)(λx . x));
where the abstracted variables we introduced, are all new.
Also, given a type τ, its CPS-transformation is inductively defined as:
■ τ∗ = τ if τ is atomic;
■ (α → β)∗ = a∗ → (β∗ → ⊥) → ⊥.

235 of 472
Logical interpretation IV

Theorem 12.7
If M : α is a term in ΛC , then M has type (α∗ → ⊥) → ⊥ in the simple
theory of types.
[Proof not required]

Equivalently, in logical terms, the theorem can be formulated as


Theorem 12.8
If Π is a proof of Γ ` A in the implicational fragment of classical
propositional logic corresponding to the term M : A, then there is an
intuitionistic proof Π of Γ∗ ` ¬¬A∗ that corresponds to M, where
Γ = B : B ∈Γ .

© ∗ ª

236 of 472
Logical interpretation V
Corollary 12.9
The CPS-translation is an embedding of the implicational fragment of
classical propositional logic into the implicational fragment of
intuitionistic propositional logic.
Proof.
Evident from the theorem and the fact that ` A ↔ ¬¬A holds in the
classical fragment.
Corollary 12.10
The evaluation of every well-typed term M : A in ΛC is finite.
Proof.
Suppose there is an infinite reduction of M : A. Then, by the theorem,
there is an infinite reduction of M : A∗ , which is impossible, being
every simply-typed term SN.
237 of 472
Conclusion
As a result of the analysis, we have coded the exception mechanism in
a suitable extension of a functional language, defining an appropriate
reduction strategy. Then, we have interpreted the language in logical
terms, obtaining a bijective correspondence with the implicational
fragment of classical propositional logic. Finally, via the
CPS-translation, we have shown that the whole language can be
embedded into the simple theory of types, via its logical interpretation
as the implicational fragment of intuitionistic propositional logic. So,
as a side result, we have constructed an implementation of the
exception mechanism inside the simple theory of types.

In this lesson, we have shown how to use the Mathematics of


λ-calculus to encode in a smooth way the exception mechanism into a
functional language, using logic as a bridge.
238 of 472
References and Hints

The material in this lesson has been taken from T.G. Griffin, A
Formulae-as-Types Notion of Control, in Proceedings of the 17th
ACM SIGPLAN-SIGACT symposium on Principles of programming
languages, ACM Press, pp. 47–58 (1990).

In that paper, the interested reader may find an outline of the omitted
proof.

239 of 472
Fundamentals of Functional Programming
Lecture 13

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline I
In the second part of this course, we will study Category Theory.

In the first part, we have seen how a functional language is


constructed starting from two basic operations: application and
abstraction. As we have seen in Combinatory Logic, abstraction is not
essential and it can be simulated by means of application.
As a matter of fact, most functional languages are constructed in this
way: they are, more or less, a typed λ-calculus with a number of
syntactical decorations to simplify the work of coding; the programs
are compiled into an intermediate code, which is executed by an
abstract machine according to a formal semantics, the one of
β-reduction. To gain performances, the intermediate code can be
expressed in Combinatory Logic, or in some other formalism suited to
a formal model of λ-calculus. E.g., ML uses a “continuation-based”
semantics, which is rooted in Domain Theory.
241 of 472
Outline II

In Category Theory, we change point of view: instead of application


and abstraction, we will use just one basic operation, which is function
composition. It turns out that the mathematical world we will
construct is deeper and much more powerful than λ-calculus. And, as
a bonus, we get an abstract model for typed λ-calculi which is, in a
sense, more elementary.

Category Theory is a hard piece of Mathematics. Its difficulty lies in


the general way it uses to pose problems and solutions. Its study
requires a “shift of view”, abandoning the traditional framework of
set-based notions in favour of a more abstract style of reasoning. It
turns out to be the “right” style also to write functional programs, and
this is the main reason to embark in its explanation.

242 of 472
Categories I
Definition 13.1 (Category)
A category C is a structure C = 〈O , A, dom, cod, ◦, id〉 such that
■ O is a collection of objects, denoted as Obj C;
■ A is a collection of arrows;
■ dom is an operation assigning to each arrow f an object dom f , its
domain;
■ cod is an operation assigning to each arrow f an object cod f , its
codomain;
■ ◦ is an operation, called composition, assigning to each pair of
arrows f and g such that cod f = dom g , an arrow g ◦ f such that
dom(g ◦ f ) = dom f and cod(g ◦ f ) = cod g ; Moreover, ◦ satisfies the
associative law: for any arrows f , g , h such that the following
composition is defined, h ◦ (g ◦ f ) = (h ◦ g ) ◦ f ;
,→
243 of 472
Categories II
,→ (Category)
■ id is an operation, called identity, assigning to each object P an
arrow idP such that dom(idP ) = P = cod(idP ); Moreover, idP
satisfies the identity law: for any arrow f with dom f = P and
cod f = Q, idQ ◦ f = f = f ◦ idP .
If P and Q are objects, we write Hom(P , Q) or C(P , Q) for the
collection of arrows whose domain is P and whose codomain is Q. We
write f : P → Q if f ∈ Hom(P , Q).

Definition 13.2
Given a category C = 〈O , A, dom, cod, ◦, id〉, we say that
■ C is small if O and A are sets;
■ C is locally small if, for every P , Q ∈ O, Hom(P , Q) is a set;
■ C is large otherwise.
244 of 472
Concrete categories I
Example 13.3 (Set)
The category Set has sets as objects and (total) functions between
them as arrows. Specifically, Set = 〈O , A, dom, cod, ◦, id〉 and
■ O is the proper class of all sets;
■ A is the proper class of f : D → C where f is a total function from
D, its domain, to C , its codomain;
■ ◦ is the usual composition of functions: given f : D → E and
g : E → C , g ◦ f : D → C is (g ◦ f )(x) = g (f (x)), for all x ∈ D;
■ idP : P → P is the identity on P, i.e., for all x ∈ P, idP (x) = x.
It is immediate to see that the associative law and the identity law
both hold.

[Exercise] Prove it.


245 of 472
Concrete categories II

There is a subtlety in the definition of Set: each function corresponds


to many arrows in Set, in fact, given f : D → C and g : D → E such
that for all x ∈ D, f (x) = g (x), it does not follow that f = g , unless
C = E.

Usually, arrows are defined along with their domain and codomain, as
we did. Also, most of the times, composition and identities are
obvious from the context; in these cases, it is customary to define the
category specifying only the objects and the arrows. Sometimes, when
also the arrows (objects) are clear from the context, just the objects
(arrows) are specified.

246 of 472
Concrete categories III

Definition 13.4 (Poset)


A preorder is a pair 〈P , ≤P 〉 such that ≤P is a binary relation on P
which is reflexive and transitive.
A partially ordered set or poset is a preorder 〈P , ≤P 〉 where the relation
is also anti-symmetric.
An order preserving function or monotone function from the poset
(preorder) 〈P , ≤P 〉 to the poset (preorder)〈Q , ≤Q 〉 is a function
f : P → Q such that, if p ≤P p 0 , then f (p) ≤Q f (p 0 ).

Posets and preorders are an example of algebraic structure, that is, a


set, the universe, plus a number of operations acting on the universe.

247 of 472
Concrete categories IV

Example 13.5 (Poset)


The category Poset has all the posets as objects and all the monotone
functions between them as arrows.

Example 13.6 (Preorder)


The category Preorder has all the preorders as objects and all the
monotone functions between them as arrows.

[Exercise] Check the Poset and Preorder are categories.

248 of 472
Concrete Categories V
Example 13.7 (Mon)
The category Mon has all the monoids as objects and their
homomorphisms as arrows. Thus, an object of Mon has the form
〈M , ·M , eM 〉, where ·M is a binary operation on M, and eM ∈ M, such
that (i) ·M is associative and (ii) eM is the unit of ·M . An arrow
f : 〈M , ·M , eM 〉 → 〈N , ·N , eN 〉 of Mon is a function f : M → N preserving
the product and the unit, i.e., (i) f (x ·M y ) = f (x) ·N f (y ) for all
x , y ∈ M, and (ii) f (eM ) = eN .

Example 13.8 (Grp)


The category Grp has all groups as objects and their homomorphisms
as arrows. A group is a monoid with an inverse operation, and group
homomorphisms are the monoid homomorphisms preserving inverses.
249 of 472
Concrete categories VI
All the previous examples are composed by objects which are sets with
some additional structure and arrows which are function preserving the
structure. In general, this pattern gives raise to a category which is
called concrete. Examples of concrete categories are:
Category Objects Arrows
Set sets total functions
Pfn sets partial functions
FinSet finite sets total functions
Mon monoids homomorphisms
Grp groups homomorphisms
Poset posets monotone functions
Rng rings homomorphisms
Vect vectors spaces linear transformations
Top topologies continuous functions
250 of 472
Abstract Categories I
Abstract categories are not concrete. Many interesting and useful
examples of categories are abstract.

Example 13.9 (0)


The category 0 has no objects and no arrows.

Example 13.10 (1)


The category 1 has a single object and only one arrow, the identity.

Example 13.11 (2)


The category 2 has two objects and three arrows: the two identities
and an arrow from one object to the second one.
251 of 472
Abstract Categories II

Example 13.12 (Discrete categories)


A discrete category is such that its arrows are just the identities.

Note that 0 and 1 are discrete categories. Note also that every set is a
discrete category whose objects are its elements

252 of 472
Abstract Categories III

Example 13.13 (Preorder categories)


A preorder 〈P , ≤P 〉 can be seen as a category P whose set of objects is
P and whose arrows are defined as follows: whenever p , q ∈ P and
p ≤P q, then there is a unique arrow p → q. By reflexivity and
transitivity, it follows that P is a category.

Note that every poset, being a preorder, gives raise to a category.

253 of 472
Abstract Categories IV

Example 13.14 (Monoid categories)


A monoid 〈M , ·M , eM 〉 can be seen as a category M whose set of
objects is a singleton, and whose set of arrows is M. Composition is
the monoid product ·M and the identity is the monoid unit eM .

Since all groups are monoids, every group can be thought to as a


category.

The category Set, as most of the concrete categories, is locally small.


All the examples we have seen of abstract categories are small.

254 of 472
Opposite category

Definition 13.15 (Opposite category)


Given a category C, the opposite or dual category Cop has the same
objects as C, but its arrows are the opposites of the arrows of C: if
f : A → B is an arrow of C, then f : B → A is an arrow of Cop .
Composition and identities are defined from C in the obvious way.

The dual category provides a duality principle: any statement S about


a category C can be transformed into a dual statement S op by
exchanging the words “domain” and “codomain”, and replacing each
composite g ◦ f by f ◦ g . If the statement S holds in C, then S op holds
in Cop . Since (Cop )op = C, if a statement S is true in any category,
then so is S op . Moreover, any construction x can be “dualised”: the
dual construction, whose arrows are reverted, is called co-x.
255 of 472
Product category and subcategories I

Definition 13.16 (Product category)


Given a pair C, D of categories, the product category C × D has as
objects the pairs (A, B) where A ∈ Obj C and B ∈ D, and as arrows the
pairs (f , g ) where f is a C-arrow and g is D-arrow. Composition and
identities are defined pointwise: (f , g ) ◦ (h, i) = (f ◦ h, g ◦ i) and
id(A,B) = (idA , idB ).

256 of 472
Product category and subcategories II

Definition 13.17 (Subcategory)


A category C is a subcategory of a category D if
■ Obj C ⊆ Obj D;
■ for all A, B ∈ Obj C, HomC (A, B) ⊆ HomD (A, B);
■ composites and identities are the same in C as in D.
Moreover, C is full if, for all A, B ∈ Obj C, HomC (A, B) = HomD (A, B).

257 of 472
Product category and subcategories III

For example Poset is a full subcategory of Preorder; similarly, Grp is


a full subcategory of Mon; evidently, also FinSet is a full subcategory
of Set. But Set is a subcategory of Pfn which is not full.

It is immediate to see that the product category of two discrete


categories C and D is the discrete category such that
Obj(C × D) = (Obj C) × (Obj D). [Exercise] Check it.

Also, it is easy to see that, for any category C, 0 × C = 0 = C × 0.

258 of 472
References and Hints

This lesson follows Chapter 1.1 of [Pierce2].

It is very important to fully understand the definition of category and


to remember the examples we have presented: they will often recur.

259 of 472
Fundamentals of Functional Programming
Lecture 14

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline
The definition of category is very flexible and it captures most of the
mathematical theories, especially the ones of interest for Computer
Science. Category Theory, at the most superficial level, provides a
uniform language to describe Mathematics, offering a unifying view of
its problems and techniques.

Nevertheless, it becomes rapidly difficult to manage categorical


properties without an agile notation. The first goal of this lesson is to
introduce such a notation.

We have seen that categories can be used to construct new categories.


But it is possible and extremely significant to search for categorical
structures inside a given category. In this lesson, we will state the
basics on this direction.
261 of 472
Diagrams
Definition 14.1 (Diagram)
A diagram in a category is a directed multi-graph whose vertexes are
the objects and whose edges are the arrows.
A diagram is said to commute if, for every pair of paths with a
common origin and target, the compositions of arrows along the paths
yield the same result

For example, saying that the following diagram commutes


f /
A@ B
@@
@@
k
h @@ g
  
C i /D
means that g ◦ f = k = i ◦ h.
262 of 472
Monics and epics I

Definition 14.2 (Monomorphism)


An arrow f : B → C in a category C is a monomorphism, or monic, or
is mono, if, for every pair g , h : A → B of arrows in C, f ◦ g = f ◦ h
implies g = h.
In a diagram, a monomorphism f is denoted as / /.
f

263 of 472
Monics and epics II

Dualising the notion of monomorphism, we obtain


Definition 14.3 (Epimorphism)
An arrow f : A → B in a category C is an epimorphism, or epic, or is
epi, if, for every pair g , h : B → C of arrows in C, g ◦ f = h ◦ f implies
g = h.
In a diagram, an epimorphism f is denoted as
f / /.

264 of 472
Monics and epics III
In Set, it is easy to verify that monomorphisms are exactly the
injective functions, and that epimorphisms are exactly the surjective
functions. [Exercise] Check it.

In general, many categorical notions are built by abstraction over the


relevant notions of Set, or same other interesting category. This
abstraction process takes place by trying to express the concepts in
terms of objects and arrows only.

In this way, one can try to replicate the proofs involving the abstracted
concept in the categorical framework. This exercise allows to transport
the result in other categories, apparently unconnected.

So, this abstraction process leads to very general notions, which may
behave unexpectedly.
265 of 472
Monics and epics IV
Example 14.4
Both 〈Z, +, 0〉 and 〈N, +, 0〉 are objects in Mon. Consider

i : 〈N, +, 0〉 → 〈Z, +, 0〉
n 7→ n

Being an injection, i is mono. But it is also epi, although not


surjective.
In fact, let f , g : 〈Z, +, 0〉 → 〈M , ∗, e 〉 such that f ◦ i = g ◦ i, and let
z ∈ Z. If z ≥ 0 then f (z) = f (i(z)) = g (i(z)) = g (z).
If z < 0, then f (z) = f (z) ∗ e = f (z) ∗ g (0) = f (z) ∗ g (−z + z) =
f (z) ∗ g (−z) ∗ g (z) = f (z) ∗ g (i(−z)) ∗ g (z) = f (z) ∗ f (i(−z)) ∗ g (z) =
f (z) ∗ f (−z) ∗ g (z) = f (z + −z) ∗ g (z) = f (0) ∗ g (z) = e ∗ g (z) = g (z).
So f = g .
266 of 472
Isomorphisms I

Definition 14.5 (Isomorphism)


An arrow f : A → B is an isomorphism, or is iso, if there is an arrow
f −1 : B → A, the inverse of f , such that f −1 ◦ f = idA and f ◦ f −1 = idB .
In this case, A is said to be isomorphic to B, notation A ∼ = B.

It is easy to prove that, if it exists, f −1 is unique, and the “being


isomorphic” relation is an equivalence.

As one would expect, in Set, isomorphisms are exactly the bijective


functions. Also, in concrete categories, isomorphisms are the usual
invertible homomorphisms. Notice that, in Mon, by the previous
example, an arrow which is mono and epi does not need to be iso.

267 of 472
Isomorphisms II
In general, Category Theory considers objects and arrows up to
isomorphisms, meaning that isomorphic objects are considered
indistinguishable.

Definition 14.6 (Subobject)


If f : A → B is mono in the category C, then f is a subobject of B.

In Set, subobjects denote subsets, up to isomorphisms. In fact, if


A ⊆ B then the canonical injection i : A → B is mono and, thus, i is a
subobject of B. Its image, i(A), is identical to A, so it is also
isomorphic to A. Consider f : A  B, its image, f (A) ∼ = A and it is a
subset of B, so “A through f ” is a subset of B, but, since A is the
domain of f , it is more correct to say that f is a subobject of B.

268 of 472
Initial and terminal objects I

Definition 14.7 (Initial object)


Given a category C, 0 ∈ Obj C is initial if, for every A ∈ Obj C, there is a
unique arrow !: 0 → A.

[Exercise] Show that 0, if it exists, is unique up to isomorphisms.

Dually,
Definition 14.8 (Terminal object)
Given a category C, 1 ∈ Obj C is terminal if, for every A ∈ Obj C, there
is a unique arrow !: A → 1.

A notational convention denotes unique arrows by !.

269 of 472
Initial and terminal objects II

In Set, ; is the unique initial object, while any singleton is a terminal


object. For this reason, an arrow f : 1 → A is said to be a (global)
element of a category. In Set, this amounts to say that, if f is an
element, then its image, f (1), is a singleton, thus f denotes a unique
member (element) of the set A.

In a preorder category P, if it exists, the initial object is the minimum,


and the terminal object is the maximum.

In Grp, any trivial group T containing just the unit, is the initial
object, as well as the terminal object. An object which is both initial
and terminal is said to be a zero object. Notice that, in Grp, this
phenomenon can be spelt out as 0 = 1!

270 of 472
Products and coproducts I
Definition 14.9 (Product)
Given a category C, the product of A, B ∈ Obj C is an object A × B,
together with two arrows π1 : A × B → A and π2 : A × B → B, its
projections, such that the diagram
π1 π2
Ao A×B /B

is universal, that is, for any object C and pairs of arrows f : C → A and
g : C → B, there is a unique arrow 〈f , g 〉 : C → A × B making the
following diagram to commute
C
y EE
f yyy
EE g
EE
yyy 〈f ,g 〉 EE
|y y  E"
A π1 A × B π2 / B
o

271 of 472
Products and coproducts II
Definition 14.10 (Coproduct)
Given a category C, the coproduct of A, B ∈ Obj C is an object A + B,
together with two arrows i1 : A → A + B and i2 : B → A + B, its
injections, such that the following diagram is co-universal
i1
A / A + B o i2 B

Thus, dually to products, in a coproduct the following diagram


commutes for each appropriate C , f and g , with [f , g ] unique
C
y< O bEEE
yy Eg
f
y [f ,g ] EEE
y
yyy EE
y i1
A / A + B o i2 B
[Exercise] Prove that [f , g ] is an isomorphism. By duality, it follows
that also 〈f , g 〉 is iso.
272 of 472
Products and coproducts III

In Set, the product A × B is just the Cartesian product, with π1 and


π2 the canonical projections; the coproduct object A + B is the disjoint
union A t B and i1 , i2 are its canonical injections.

In a poset category P, the product object A × B, when it exists, is the


greatest lower bound of A and B, while, symmetrically, A + B is the
least upper bound of A and B.

In analogy with the set-theoretic interpretation of products and


coproducts, it is worth extending the definitions from binary
operations to operations of an arbitrary arity.

273 of 472
Products and coproducts IV
Definition 14.11 (Product)
The product of a family {Ai }i ∈I of objects indexed by the set I is an
object Πi ∈I Ai and a family {πi : Πi ∈I Ai → Ai }i ∈I of arrows such that the
following diagram is universal:

Πi ∈I AFi
yy FF
πj
yy FFπk
yy FF
y FF
|yy "
Aj ··· Ak

If |I | = 2, this definition of product reduces to the one for binary


products. If |I | = 1, the notion of product reduces to Πi ∈I Ai = Ax ,
where x is the unique element of I , with π1 = idAx . If |I | = 0, the
notion of product reduces to the definition of terminal object.
274 of 472
Products and coproducts V

A category where, for any collection U of objects, there exists the


product ΠU, is said to have arbitrary products. If the property holds
only when U is finite, the category is said to have finite products.
Dually, for coproducts.

For example, Set has arbitrary products and coproducts.

A poset category P having finite products and coproducts is a


bounded lattice; if it has arbitrary products and coproducts, it is a
complete and bounded lattice.

275 of 472
Limits and colimits I
In the definitions of products we used the notion of universal
construction. Properly speaking, universal constructions are limits.

Definition 14.12 (Cone)


Let C be a category and D a diagram in C. A cone for D is an
X ∈ Obj C together with a family fi : X → Di Di ∈Obj D of arrows
© ª

indexed by the objects in D such that, for each g : Di → Dj arrow in


D, the following diagram commutes

X@
~~ @ @@ fj
fi
~~~ @@
~ @
~
~
Di / Dj
g

276 of 472
Limits and colimits II

Definition 14.13 (Limit)


A limit for a diagram D in© a category ªC is a cone fi : X → Di for D
© ª

with the property that, if fi 0 : X 0 → Di is a cone for D, then there is a


unique arrow k : X 0 → X such that the diagram

X0 A
k /X
AA ~~
AA ~
A
f 0 AA ~~~f
i ~
 i

Di

commutes for every Di in D.

277 of 472
Limits and colimits III

Example 14.14
Given A, B ∈ Obj C, consider the diagram D whose vertexes are A and
B, with no edges. The limit of D is, if it exists, the product A × B.
Similarly, the product ΠU, for U ⊆ Obj C, if it exists, is the limit of the
diagram D with U as the set of vertexes and no edges.

Example 14.15
Consider the empty diagram D. Hence, a cone for D is any object
and, thus, its limit is a terminal object, if it exists.

278 of 472
Limits and colimits IV

Dualising the notion of limit, we obtain the concept of colimit.

Definition 14.16 (Colimit)


A cocone for©
a diagramª
D in a category C is an X ∈ Obj C and a
collection fi : Di → X of arrows indexed by the objects in D such
that fj ◦ g = fi for each g : Di → Dj in D. © ª
A colimit, or inverse limit, for D is a cocone fi : Di ª→ X for D with
the co-universal property that, for any fi : Di → X 0 cocone for D,
© 0

there is a unique arrow k : X → X 0 such that k ◦ fi = fi 0 for every object


Di in D.

[Exercise] Show that coproducts and initial objects are colimits.

279 of 472
References and Hints

This lesson follows Chapter 1.2, 1.3, 1.4, 1.5, 1.6 and 1.9 of [Pierce2].

Useful hints and explanations can be found in Chapter 3 of [Goldblatt].


Some of the examples have been taken from this highly approachable
text.

Beware that Exercise [Link] in [Pierce2] is wrong. In fact, the last


sentence should be read as “Also, if g ◦ f is monic then so is g ”. You
may want to prove that the actual text is wrong by providing a
counterexample.

280 of 472
Fundamentals of Functional Programming
Lecture 15

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

Beside products, coproducts, terminal and initial objects, there is a


number of interesting and useful structures which can be described as
limits or colimits. In this lesson, we will introduce them.

Most of these structures arise from relevant constructions in Set,


while a few others derive from algebraic categories, like Grp or Vect.

These constructions are the essential ingredients of the categorical


language and they have a meaning in most mathematical theories.

282 of 472
Equalisers and coequalisers I

Definition 15.1 (Equaliser)


An arrow e : X → A in a category C is an equaliser of a pair of arrows
f , g : A → B if
■ f ◦ e = g ◦ e;
■ whenever e 0 : X 0 → A satisfies f ◦ e 0 = g ◦ e 0 , there is a unique arrow
k : X 0 → X such that e : k = e 0 .

In other words, an equaliser is the limit of the diagram

// B
f
A g

283 of 472
Equalisers and coequalisers II

Definition 15.2 (Coequaliser)


A coequaliser in a category C for a pair of arrows f , g : A → B is the
colimit of the diagram
// B
f
A g

Example 15.3
© ª
Let f , g : A → B in Set, and let X = x ∈ A: f (x) = g (x) . Then, the
inclusion e : X© → A is an equaliserª of f , g .
Also, let S = (f (x), g (x)): x ∈ A ⊆ A × B and let R be the minimal
equivalence relation containing S. Call [y ]R the equivalence class
containing y ∈ B. Then, the map fR : B → B/R given by b 7→ [b]R is
the coequaliser of f , g .

284 of 472
Equalisers and coequalisers III
As usual, equalisers and coequalisers are unique up to isomorphisms.

Lemma 15.4
Every equaliser is monic.

Proof.
Suppose i : E → A equalises f , g : A → B. Let i ◦ j = i ◦ l where
j , l : C → E and let h : C → A be h = i ◦ j. We have
f ◦ h = f ◦ i ◦ j = g ◦ i ◦ j = g ◦ h and so there is a unique k : C → E with
i ◦ k = h. But h = i ◦ j, so k must be j. However, i ◦ l = i ◦ j = h, so k = l ,
and j = l .

Corollary 15.5
Every coequaliser is epic.
285 of 472
Existence of limits I
Theorem 15.6 (Limit)
Let D be a diagram in a category C, with sets V of vertexes and E of
edges. If every V -indexed and every E -indexed family of objects in C
has a product and every parallel pair of arrows in C has an equaliser,
then D has a limit.

Proof. (i)
For any Di ∈ V and (e : Di → Dj ) ∈ E , consider the diagram:

Di o πi
ΠDi ∈V Di
MMM
MMMπj
e MMM
 MMM
&
Dj o πe
Π(e : Di →Dj )∈E Dj
πe
/ Dj

,→
286 of 472
Existence of limits II
,→ Proof. (ii)
Since Π(e : Di →Dj )∈E Dj is a product, there is a unique

p : ΠDi ∈V Di → Π(e : Di →Dj )∈E Dj

such that πe ◦ p = πj . For the same reason, there is a unique

q : ΠDi ∈V Di → Π(e : Di →Dj )∈E Dj

such that πe ◦ q = e ◦ πi . In diagrams:


Di o π i ΠDi ∈V Di
M MMM π
MMMj
e q p MMM
   MM&
Dj o πe
Π(e : Di →Dj )∈E Dj
πe
/ Dj
,→
287 of 472
Existence of limits III

,→ Proof. (iii)
Let h : X → ΠDi ∈V Di be an equaliser of p , q, and call fi = πi ◦ h.

pX
p
fipppp
p
ppp h
xoppp  fj
Di ΠDi ∈V Di
πi MMM
MMMπj
e q p MMM
   MMM 
&
Dj o Π(e : Di →Dj )∈E Dj / Dj
πe πe

Consider fi : X → Di Di ∈V . It holds that e ◦ fi = e ◦ πi ◦ h = πe ◦ q ◦ h =


© ª

= πe ◦ p ◦ h = πj ◦ h = fj , so fi : X → Di Di ∈V is cone for D. ,→
© ª

288 of 472
Existence of limits IV
,→ Proof. (iv)
Assume that fi 0 : X 0 → Di D ∈V is a cone for D. By the universal
© ª
i
property of products, there is a unique arrow h0 : X 0 → ΠDi ∈V Di such
that πi ◦ h0 = fi 0 for each Di ∈ V .
So, for any (e : Di → Dj ) ∈ E , πe ◦ p ◦ h0 = πj ◦ h0 = fj0 = e ◦ fi 0 = e ◦πi ◦ h0 =
= πe ◦ q ◦ h0 . Thus, the following diagram commutes:
fj0
X0 /D
rrr8 j
p ◦h0 q ◦h0 rrr
rrr
  rrr π e

Π(e : Di →Dj )∈E Dj

which, by the universal property of the product, implies that


p ◦ h0 = q ◦ h0 , or, in other terms, h0 equalises p , q. ,→
289 of 472
Existence of limits V
,→ Proof. (v)
Thus, by the universal property of equalisers, there is a unique
k : X 0 → X such that h ◦ k = h0 . Looking at the commutative diagram

XO II
II fi
IIh
II
I$
k ΠDi ∈V Di
πi
/&8 Di
:
vvv
vv
vv 0
vv h fi0
X0

it holds that fi ◦ k = π©i ◦ h ◦ k = πiª◦ h0 = fi 0 . So k : X 0 → X has the


property to say that fi : X → Di Di ∈V is a limit. But we must prove
that k is unique. ,→
290 of 472
Existence of limits VI
,→ Proof. (vi)
Suppose k 0 : X 0 → X satisfies fi ◦ k 0 = fi 0 . Then, since

πi ◦ h0 = fi 0 = fi ◦ k 0 = πi ◦ h ◦ k 0

for all Di ∈ V , the universal property of the product ΠDi ∈V Di


guarantees that h ◦ k 0 = h0 .
But the unique arrow with this property is k, so k 0 = k.

Corollary 15.7
A category with equalisers and arbitrary products has all limits.

Corollary 15.8
A category with equalisers and finite products has all finite limits.
291 of 472
Pullbacks and pushouts I

Definition 15.9 (Pullback)


The pullback of a pair of arrows f : A → C and g : B → C is the limit
of the diagram
A
f /C o g B .

Dually,
Definition 15.10 (Pushout)
The pushout of a pair of arrows f : C → A and g : C → B is the colimit
of the diagram
g
Ao /B .
f
C

292 of 472
Pullbacks and pushouts II

Example 15.11
Let f : B → C in Set and let A ⊆ C . Then the following is a pullback:

f −1 (A)
⊆ /B

f|f −1 (A) f
 
A /C

where f|f −1 (A) is f restricted to f −1 (A) = x ∈ B : f (x) ∈ A .


© ª

293 of 472
Pullbacks and pushouts III
Example 15.12
In Set, the following is a pullback which defines intersection:

A∩B
⊆ /B

⊆ ⊆
 
A /C

Example 15.13
In Set, let f : A → C and let g : B → C . Then,
© ª
P = (a, b) ∈ A × B : f (a) = g (b)

is the pullback object, while its projections are πf : P → B, (a, b) 7→ b,


and πg : P → A, (a, b) 7→ a.
294 of 472
Pullbacks and pushouts IV
Example 15.14
In any category with a terminal object, the following is a pullback:
π2
A×B /B

π1 !
 
A /1
!

Example 15.15
In any category, if
X
e /A

e g
 
A /B
f

is a pullback, then e is an equaliser of f , g .


295 of 472
Pullbacks and pushouts V

Lemma 15.16 (Pullback)


Consider the following diagram:

• /• /•
α β
  
• /• /•

■ If both the inner squares α and β are pullbacks, then so is the outer
rectangle;
■ If the β square and the outer rectangle are pullbacks, then so is the
α square.

296 of 472
Pullbacks and pushouts VI
Proof. (i)
First, notice that, in both cases, the diagram is commutative.
For (1), consider the following commutative diagram:
X / UUUUU
// UUUU
// UUfUU
UUUU
// UUUU
/ / U*/
g / P C B
//
// α β
/   
D e /A /•

Since β is a pullback and e ◦ g : X → A, f : X → B, there is a unique


k : X → C making the diagram to commute. But α is a pullback and
k : X → C , g : X → D, so there is a unique h : X → P making the
diagram to commute. ,→
297 of 472
Pullbacks and pushouts VII
,→ Proof. (ii)
For (2), consider the following commutative diagram:
f

X@ P /( C i /B
@ @@
c
@
g @@ d
α j β
  a /  
D A /•

Since the outer rectangle is a pullback and g : X → D, i ◦ f : X → B,


there is a unique h : X → P such that i ◦ f = i ◦ c ◦ h and g = d ◦ h.
Then, α is a pullback if f = c ◦ h.
From β being a pullback and a ◦ g : X → A, i ◦ f : X → B, there is a
unique k : X → C such that i ◦ f = i ◦ k and a ◦ g = j ◦ k. But both f and
c ◦ h satisfy the equations for k, so, by uniqueness of k, f = c ◦ h.
298 of 472
Pullbacks and pushouts VIII
Lemma 15.17
If the following diagram is a pullback, then g 0 is mono:
f0 /B
P

g0 g
 
A /C
f

Proof.
Let h, k : X → P be such that g 0 ◦ h = g 0 ◦ k.
Then, g ◦ f 0 ◦ h = f ◦ g 0 ◦ h = f ◦ g 0 ◦ k = g ◦ f 0 ◦ k. But, g is mono, so
f 0 ◦ h = f 0 ◦ k.
Since the square is a pullback and g 0 ◦ h = g 0 ◦ k : X → A,
f 0 ◦ h = f 0 ◦ k : X → B, there is a unique l such that
g 0 ◦ h = g 0 ◦ k = g 0 ◦ l , thus h = k.
299 of 472
Pullbacks and pushouts IX

[Exercise] By duality, state and prove the corresponding results for


pushouts.

[Exercise] Characterise pullbacks and pushouts in a poset category.

Equalisers and pullbacks, as well as their dual counterparts, are typical


constructions in Category Theory. Differently from products, initial
and terminal objects, they sound unfamiliar to set-theoretic reasoning.
Take some effort to understand them and their manipulation.

300 of 472
References and Hints

This lesson covers Chapters 1.7, 1.8 and 1.9 of [Pierce2].

Some examples and hints have been taken from [Goldblatt].

301 of 472
Fundamentals of Functional Programming
Lecture 16

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

In this lesson, we want to introduce some more constructions. These


structures are closely related to the functional interpretation of
categories, and they have strong links with the λ-calculus.

Moreover, these constructions are central in Category Theory since


they provide, when present, a sort of “completeness” of the category.

With these tools, we will be ready to build the advanced instruments


of Category Theory, which allow to exploit the full power of this
approach to (functional) reasoning.

303 of 472
Kernels I
Definition 16.1 (Kernel relation)
In Set, let f : A → B be a function. The kernel relation Rf associated
to f is defined as: x Rf y iff f (x) = f (y ).

In categorical terms, the kernel relation is defined by the pullback


π2
Rf /A

π1 f
 
A /B
f

where π1 and π2 are the projections of


© ª
Rf = (x , y ) ∈ A × A: f (x) = f (y ) .
304 of 472
Kernels II

Definition 16.2 (Kernel)


Let C be a category with a zero (initial and terminal) object. Then,
the kernel K of f : A → B is defined by the pullback

K /A

! f
 
0 /B
!

In Mon, Grp ©and Vect, as inª many other categories derived form
Algebra, K = x ∈ A: f (x) = e , where e is the unit of the algebraic
structure.

305 of 472
Exponentiation I
Definition 16.3 (Exponentiation)
A category C with all products has exponentiation if, for any
A, B ∈ Obj C, there is B A ∈ Obj C, the exponential object, and
ev : B A × A → B, the evaluation arrow, such that, for any C ∈ Obj C
and g : C × A → B, there is a unique h : C → B A making the following
diagram to commute:

BA ×
O A
F FF
FFev
FF
FF
#
〈h,idA 〉
w ;B
w ww
w
ww g
ww
C ×A

306 of 472
Exponentiation II

Since g = ev ◦〈h, idA 〉, from g we can construct h and vice versa, it


follows that Hom(C × A, B) ∼ = Hom(C , B A ). Hence, the interpretation
of having exponentiation is that the category allows currying of its
arrows.

307 of 472
Exponentiation III

Definition 16.4 (Cartesian closed category)


A category C is complete if every diagram in C has a limit; dually, it is
co-complete if every diagram in C has a colimit.
A finite diagram is one having finite vertexes and edges.
A category C is finitely complete (finitely co-complete) if it has a limit
(co-limit, respectively) for any finite diagram. Sometimes, finitely
complete is referred to as Cartesian.
A finitely complete category with exponentiation is said to be
Cartesian closed.

308 of 472
Exponentiation IV

Example 16.5
The category Set is Cartesian closed since it is finitely complete and
B A = Hom(A, B). It is also complete and co-complete.

Example 16.6
The category Grp is finitely complete, since it has all products and a
terminal object, but it does not admit exponentiation, thus is it not
Cartesian closed.

309 of 472
Exponentiation V
Theorem 16.7
A category C having a terminal object and all pullbacks is finitely
complete.

Proof. (i)
Considering the pullback
π2
A×B /B

π1 !
 
A /1
!

we see that C has all binary products. By an easy induction, starting


from the terminal object, it follows that C has all finite products. ,→
310 of 472
Exponentiation VI

,→ Proof. (ii)
Let f , g : A → B, then 〈idA , f 〉, 〈idA , g 〉 : A → A × B. Forming their
pullback
p
E /A
q 〈idA ,g 〉
 
A 〈id ,f 〉/ A × B
A

we see that 〈q , f ◦ q 〉 = 〈idA , f 〉 ◦ q = 〈idA , g 〉 ◦ p = 〈p , g ◦ p 〉, so p = q and


f ◦ q = g ◦ p.
Moreover, being a pullback, for every i , j : X → A such that
〈idA , g 〉 ◦ j = 〈idA , f 〉 ◦ i, there is a unique h : X → E such that i = q ◦ h
and j = p ◦ h. Thus, i = j and E is an equaliser of f , g .

311 of 472
Subobject classifiers I
Definition 16.8 (Subobject classifier)
In a category C with a terminal object, a subobject classifier is an
object Ω with an arrow > : 1 → Ω satisfying the Ω-axiom: for each
subobject f : A  B, there is a unique χf : B → Ω, the characteristic
arrow, such that
A / /B
f

! χf
 
1 /Ω
>

is a pullback square

It is easy to show that the subobject classifier, when it exists, is unique


up to isomorphisms, and that the characteristic arrow is monic.
312 of 472
Subobject classifiers II
Lemma 16.9
If A and B are isomorphic and f : A  C , g : B  C are subobjects of
C , then χf = χg , and vice versa.
Proof.
Consider the diagram g
6 /( C
B? k /A / f
??
?? χf
? !
! ??  

1 /Ω
>
If χf = χg , the outer square commutes, so, being the inner square a
pullback, there is a unique k : B → A such that g = f ◦ k. But k is iso
because the outer square is a pullback, so A ∼ = B.

Vice versa, if A = B and k : B → A is iso, it is easy to see that the
outer square is a pullback, thus χf = χg .
313 of 472
Subobject classifiers III

Definition 16.10 (Topos)


An elementary topos, usually abbreviated in topos, is a category E
such that
■ E is finitely complete;
■ E is finitely co-complete;
■ E has exponentiation;
■ E has a subobject classifier.

It can be shown, by means of a very technical proof, that being finitely


co-complete is implied by the other conditions, so a topos is a
Cartesian closed category with a subobject classifier.

314 of 472
Subobject classifiers IV
Example 16.11
Set is a topos since it is Cartesian closed and its subobject classifier is
the set Ω = { 0, 1 } with > : 1 → Ω defined by >(x) = 1.
In fact, let A ⊆ B, thus i : A → B with i(x) = x:

A / /B
i

! χi
 
1 /Ω
>

½
1 if x ∈ A
with χi (x) = .
0 otherwise

Notice that FinSet is a topos too, because of the same argument.


315 of 472
Power objects I

Definition 16.12 (Power object)


A category C with products has power objects if, to each A ∈ Obj C,
there are ℘(A), ²A ∈ Obj C and a monic ² : ²A  ℘(A) × A such that,
for any B ∈ Obj C and monic r : R  B × A, there is a unique
fr : B  ℘(A) for which the following diagram is a pullback

R / / B ×A
r

〈fr ,idA 〉
 
²A / / ℘(A) × A
²

316 of 472
Power objects II
Example 16.13
In Set, ℘(A) =©{ U : U ⊆ A }, ²A is the relation x ∈ U with U ⊆ A, which
is the set ²A = (U , x) ∈ ℘(A) × A: x ∈ U , and ² is the canonical
ª

inclusion. © ª
In fact, given a relation R ⊆ B × A, we can define fR (x) = y ∈ A: x r y
since R can be thought to as x ∈A fR−1 (x), or, equivalently, as
S

h : R → ²A with h(x , y ) = (fR (x), y ). Then

R / / B ×A

h 〈fR ,idA 〉
 
²A / / ℘(A) × A
²

commutes and, indeed, it is a pullback square.


317 of 472
Power objects III

Theorem 16.14
A category E is a topos iff E is finitely complete and has power objects.
[Proof not required]

Power objects are a useful concept, but somewhat difficult to work


with, hence we, as most topos-theorists, prefer the definition of a
topos as a Cartesian closed category with subobject classifiers.

318 of 472
Slice categories I

In order to conclude our set of basic constructions, we need a last tool:


Definition 16.15 (Slice category)
Let C be a category and A ∈ Obj C. The slice category C/A on A has
as objects the arrows in C with codomain A and as arrows the arrows
f of C making the following triangle to commute:

B?
f /C
? ?? 
?? 
g ? h
 
A

319 of 472
Slice categories II

Example 16.16
Let C be a small discrete category with I = Obj C. Then Set/I is a
slice category corresponding to the bundles on I .
Let p : A → I be a function in Set, then the fibre of p on i ∈ I is the set
Ai = p −1 ({ i }), and the bundle of p on I is the set { Ai : i ∈ I }, i.e., the
set A partitioned by the inverse images of p. Thus, the slice category
Set/I is the category whose objects are partitioned sets and whose
arrows are partition-preserving functions.

It can be shown that Set/I is, essentially, a topos. To state precisely


this last assertion, and to prove it, we need a different view, which we
will undertake from the next lecture.

320 of 472
References and Hints

This lesson is taken from Chapter 1.10 of [Pierce2].

The definition of kernel, the proof of Theorem 16.7, the notion of


subobject classifier, power object and slice category come from
Chapter 3 and 4 of [Goldblatt]. Also, the omitted proof can be find in
that textbook.

If you want to try the exercises in [Pierce2], it is better to do [Link]


after [Link] and [Link]: this should give you an hint on how to
construct the requires counterexample; you also want to look at the
concept of non-distributive lattice.

321 of 472
Fundamentals of Functional Programming
Lecture 17 — Intermezzo

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

The purpose of this lecture is to develop Category Theory as a


programming instrument.

Specifically, we will define data structures and procedures in a


functional language to code categories and their basic constructions.

In this way, we will be able to use a mathematical theory, expressed in


the categorical framework, as a set of computational constructions
which allow to calculate interesting and possibly difficult tasks.

323 of 472
Categories I
Our first task is to define categories as elements of a datatype. We
adopt an ML-like syntax.
datatype (o,a)Cat = cat (a → o) × (a → o) ×
× (o → a) × (a → a → a);

The meaning is as follows:


■ o and a are type variables, representing parametric types, which
stand for objects and arrows, respectively:
■ cat is a type constructor with four arguments;
■ the first two arguments are the dom and cod maps, respectively;
■ the third argument, given an object A, calculates idA ;
■ the fourth argument, is just the composition operation.
324 of 472
Categories II
As usual, we can define destructors for this datatype:
fun dom (cat X ) = π1 X ;
fun cod (cat X ) = π2 X ;
fun id (cat X ) A = π3 X A;
fun compose (cat X ) f g = π4 X g f ;

We have used the keyword fun to indicate the solution in the first
variable of the equation in the λ-calculus behind the functional syntax.
So, fixing X , and considering dom a variable, the first definition is
equivalent to asking the solution of the equation
dom (cat X ) = π1 X

325 of 472
Categories III

There is no control over the correctness of an instance of the Cat


type. We assume that identities behave as units for composition and
that composition is associative.

This assumption is reasonable: the proofs of these facts have no


computational meaning by themselves, but they ensure that identities
and composition behave as they should.

We stipulate that we never promote a quadruple of appropriate values


to the type Cat unless we have checked in advance that it forms,
indeed, a category.

326 of 472
FinSet
As an example, we can code FinSet, the category of finite sets as:
datatype o SetArrow = setarrow (o Set) × (o → o) × (o Set);
let
fun setdom (setarrow(x,f ,y )) = x;
fun setcod (setarrow(x,f ,y )) = y ;
fun setid (A : o Set) = setarrow(A, λx . x, A));
fun setcomp (setarrow(c,g ,d )) (setarrow(a,f ,b)) =
if b = c then setarrow(a,λx . g (f (x)),d )
else raise non-composable-pair;
in FinSet = cat(setdom, setcod, setid, setcomp);

We used the ML keyword raise to generate an exception which


simulates a partial operation. This is slightly simpler than constructing
an internally coherent mechanism to cope with non composable pairs.
327 of 472
Categories IV
We can construct categories from categories as functions over the
datatype Cat.

For example, the opposite category is defined as:


fun opCat (cat(s,t,i,c)) = cat(t,s,i,λg , f . c f g );

Since limits are colimits in the opposite category, we can construct the
formers from the latters, or vice versa, as it is simpler to code.

In order to code the basic constructions, we expand the convention on


categories: each construction is coded as a type and, although we will
not necessarily check it, we assume that every instance satisfies the
intended properties of the construction.
328 of 472
Initial object I

An initial object is composed by two elements: an object I and a


(co-)universal function which, given an object A in the category,
returns the unique arrow I → A.

This description is directly coded as a datatype:


datatype (o,a)InitialObj = initialobj o × (o → a);

where the first component is the initial object and the second
component is the family of arrows departing from the initial object to
the index, where the index ranges over all the objects in the category.

329 of 472
Initial object II

Since initial objects are always isomorphic, we can code this fact as a
function returning an isomorphism, i.e., a pair of arrows, one the
inverse of the other:
fun isoinitial(initialobject(A,univA ), initialobject(A0 ,univA0 )) =
(univA (A0 ), univA0 (A));

330 of 472
FinSet

In FinSet, the initial object is the empty set, and the arrow f : ; → A,
for every A, is the function nowhere defined.

So, the initial object is coded as:


nilfn = λx . raise nil-function;
setinit = initialobj(;, λA. setarrow(;,nilfn,A));

331 of 472
Binary coproducts I
Similarly to initial objects, we can code binary coproducts:
datatype (o,a)CoproductCocone =
coproductcocone (o × a × a) × (o × a × a → a);
datatype (o,a)Coproduct =
coproduct (o × o → (o , a)CoproductCocone);

The meaning of the second declaration is that the coproduct is an


operation which takes two objects and calculates the corresponding
cocone. The cocone is (a + b, f , g ) along with the universal arrow u
depending on (c 0 , f 0 , g 0 ) as in the following diagram:
f / g
aC
CC a+b o b
CC { {{
CC u {{{
f 0 CC!  }{{{ g 0
c0
332 of 472
FinSet
In FinSet, we define binary coproducts as follows:
datatype (a)Tag = l a | r a;
fun setcoprod(A,B) =
let
s = (mapset l A) ∪ (mapset r B);
fun u(c,setarrow(_,f ,_),setarrow(_,g ,_)) =
let
fg(l x) = f x;
fg(r x) = g x;
in setarrow(s,fg,c);
in coproductcocone((s, setarrow(A,l,s), setarrow(B,r,s)), u);
setcoproduct = coproduct setcoprod;

333 of 472
Diagrams and colimits I

Finite graphs and diagrams may be directly represented as types:


datatype Graph = graph (N Set) × (E Set) × (E → N) × (E → N);
datatype (o,a)Diagram = diagram Graph × (N → o) × (E → a);

A diagram is a graph together with a map from nodes to objects and a


map from edges to arrows of a given category.

A graph is a set of nodes, a set of edges together with maps for


domains and codomains of edges. Since the definition does not limit
one edge per pair of nodes, we are really working with multi-graphs.
Also, graphs and diagrams are finite, since the primitive type Set
stands for finite sets.

334 of 472
Diagrams and colimits II

A cocone is represented as
datatype (o,a)Cocone = cocone o × (o,a)Diagram × (N → a);

Thus a cocone is an object A together with a diagram, its base, and a


family of arrows indexed by the nodes N of the diagram, in such a way
that each arrow in the family has A as codomain.

In keeping with the categorical dictum of defining arrows as well as


structures, we define arrows between cocones with the same base:
datatype (o,a)CoconeArrow =
coconearrow (o,a)Cocone × a × (o,a)Cocone;

335 of 472
Diagrams and colimits III

Hence, colimits are represented as types:


datatype (o,a)ColimitingCocone =
colimcocone (o,a)Cocone × ((o,a)Cocone → (o,a)CoconeArrow);
datatype (o,a)Colimit =
colimit ((o,a)Diagram → (o,a)ColimitingCocone);

whose meaning is fairly evident.

A finitely cocomplete category has a colimit for every finite diagram:


datatype (o,a)CocompleteCat =
cocompletecat (o,a)Cat × (o,a)Colimit;

336 of 472
Calculating colimits I

To calculate the colimit of a diagram in a given category C, we


assume the category to have initial objects, binary coproducts, and
coequalisers:
datatype (o,a)IOCPCECat =
iocpcecat (o,a)Cat × (o,a)InitialObj ×
× (o,a)Coproduct × (o,a)Coequalizer;

[Exercise] Define the type (o,a)Coequalizer.

337 of 472
Calculating colimits II

The function which calculates the colimit of a finite diagram in a given


category is:
fun finitecolimit (iocpcecat(C ,init,bcoprod,coeq))
(diagram(graph(N,E ,s,t),fo,fa)) =
let cC = iocpcecat(C ,init,bcoprod,coeq);
d = diagram(graph(N,E ,s,t),fo,fa);
in if E = ; then finitecoproduct (C ,init,bcoprod) d
else let { e } ∪ E1 = E ;
d1 = diagram(graph(N,E1 ,s,t),fo,fa);
in addedge (C ,coeq) ((finitecolimit cC d1 ),e);

338 of 472
Calculating colimits III
The logic is as follows: if the diagram has no edges, its colimit is the
finite coproduct of its objects; otherwise, we obtain the colimit by a
construction (addedge) which adds an edge e to the colimit of the
diagram deprived of e.

Of course, we have to write the functions finitecoproduct and


addedge. Their types are:
finitecoproduct: (o,a)Cat × (o,a)InitialObj × (o,a)Coproduct →
((o,a)Diagram → (o,a)ColimitingCocone);
addedge: (o,a)Cat × (o,a)Coequalizer →
((o,a)ColimitingCocone × Edge →
(o,a)ColimitingCocone);

339 of 472
Calculating colimits IV

When the diagram has no nodes, finitecoproduct reduces to:


let initialobj(i,u) = init;
icocone = cocone(i,nildiagram,nilfn);
in colimcocone(icocone,λc . coconearrow(icocone,u (coapex c), c));

where
nildiagram = diagram(graph(;, ;, nilfn, nilfn), nilfn, nilfn);
coapex (cocone(a,-,-)) = a;

340 of 472
Calculating colimits V
When the diagram D = diagram(graph(N,E ,s,t),fo,fa) is non-empty,
finitecoproduct operates as:
let { n } ∪ N1 = N;
colimcocone(c,uc) =
finitecoproduct (C ,init,bcoprod)
(diagram(graph(N1 ,E ,s,t),fo,fa));
coproductcocone((b,f ,g ), ucp) = bcoprod(coapex c,fo n);
resultcocone =
cocone(b,D,λm. if m = n then g
else compose(C )(f , sides c n);
universal = colimcocone(λc . let u = coapexarrow(u c);
v = ucp(coapex c,u,sides c n);
in coconearrow(resultcocone,v ,c))
in colimitingarrow(resultcocone,universal);
341 of 472
Calculating colimits VI

In the previous code:


■ sides returns the function from nodes in the base to the apex in a
given cocone;
■ coapexarrow extracts the arrow between the apices of a given
cocone arrow.

Apart the complexity of the details in constructing all the pieces of the
resulting cocone, its logic is clear: recursively constructing finite
coproducts as binary coproducts until there are no more nodes and
thus the colimit is the initial object.

342 of 472
Calculating colimits VII
The function addedge is defined as
fun addedge (C ,coeq) ((c,u),e) =
let diagram(graph(N,E ,s,t),fo,fa) = base c;
((b,h),ceu) =
coeq(sides c (s e), compose C (sides c (t e,fa e)));
resultdiagram = diagram(graph(N,{ e } ∪ E ,s,t),fo,fa);
resultcocone =
cocone(b,resultdiagram, λn. compose C (h,sides c n));
universal = λc1 . let w = coapexarrow (u c);
v = ceu (coapex c1 ,w );
in coconearrow(resultcocone,v ,c1 );
in (resultcocone,universal);

343 of 472
Calculating colimits VIII

In the previous code:


■ base returns the diagram which forms the base of a given cocone.

Again, apart the intricacies of the construction of all the involved


arrows, the logic is clear.

As a matter of fact, the code calculating finite colimits is in


one-to-one correspondence with the proof of the lemma stating that a
category with initial object, binary coproducts, and coequalisers has all
the finite colimits.

344 of 472
References and Hints

This lesson is derived from selected material in Chapters 3 and 4 of


[Rydeheard].

345 of 472
Fundamentals of Functional Programming
Lecture 18

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

Categories are the subject of Category Theory, which is a


mathematical theory.
Categories are formal models of mathematical theories.
Thus, it makes sense to imagine a “category of categories” having
categories as objects.

The aim of this lesson is to develop a notion of arrows between


categories. Also, in order to study the behaviour of these arrows, we
need another construction, which embodies the idea of “canonical”
transformations of arrows.

These notions are called functors and natural transformations, and


they form the real core of Category Theory.

347 of 472
Functors I

Definition 18.1 (Functor)


Let C and D be categories. A functor F : C → D is formed by
■ a map F : Obj C → Obj D;
■ a family of maps F : HomC (A, B) → HomD (F (A), F (B)), one for
each pair of objects A, B ∈ Obj C;
such that
■ F (idA ) = idF (A) for all A ∈ Obj C;
■ F (g ◦ f ) = F (g ) ◦ F (f ) for all f ∈ HomC (A, B), g ∈ HomC (B , C ).

Definition 18.2 (Presheaf)


Any functor F : Cop → Set is called a presheaf.

348 of 472
Functors II

Example 18.3 (Forgetful functors)


Let U : Mon → Set defined as
■ for 〈M , ∗, e 〉 ∈ ObjMon, U(〈M , ∗, e 〉) = M;
■ for f : 〈M , ∗, e 〉 → 〈M 0 , ∗0 , e 0 〉, U(f ) = f : M → M 0 ;
where the homomorphism f is thought to as a function.
These functors, which forget about some structure of the source
category, are called forgetful functors.

349 of 472
Functors III

Example 18.4 (Identity functor)


For any category C, the identity functor, IdC : C → C is defined as
IdC (A) = A for A ∈ Obj C, and IdC (f ) = f for f ∈ HomC (A, B).

350 of 472
Functors IV

Example 18.5 (Product functors)


Let C be a category with products. Then, each A ∈ Obj C determines a
pair of functors (− × A): C → C, the right product functor, and
(A × −): C → C, the left product functor, such that
■ for B ∈ Obj C, (− × A)(B) = B × A and (A × −)(B) = A × B;
■ for f ∈ HomC (B , C ), (− × A)(f ) = 〈f , idA 〉 and (A × −)(f ) = 〈ida , f 〉.

351 of 472
Functors V

Definition 18.6 (Contravariant functor)


A contravariant F : C → D is the same thing as the functor F : Cop → D.

A contravariant functor is a useful concept when dealing with


presheaves, since it operates like a normal (covariant) functor on
objects, but it reverses the direction of arrows.

352 of 472
Functors VI

Example 18.7 (Hom functors)


Let C be a locally small category. Then, each A ∈ Obj C determines a
pair of functors Hom(A, −): C → Set and Hom(−, A): Cop → Set
■ for B ∈ Obj C,

Hom(A, −)(B) = HomC (A, B) and Hom(−, A)(B) = HomC (B , A) ;

■for f ∈ HomC (B , C ), Hom(A, −): HomC (A, B) → HomC (A, C ) and


g 7→ f ◦ g
Hom(−, A): HomC (C , A) → HomC (B , A) .
g 7→ g ◦ f
Notice how Hom(−, A) is a contravariant functor.

353 of 472
Functors VII

Example 18.8 (Subobject functor)


Let C be a category with pullbacks. Then C determines a
contravariant functor Sub: Cop → Set defined as
for A ∈ Obj C, Sub(A) = B ∈ Obj C : B is a subobject ofA ;
© ª

■ for f ∈ HomC (A, B), Sub(f ): Sub(B) → Sub(A) which assigns to


g : C  B the arrow h : D  A defined by the following pullback:

D / /A
h

f
 
C / g
/B

354 of 472
Functors VIII

Definition 18.9 (Functor composition)


Let F : C → D and G : D → E be functors. Then, their composition
G ◦ F is a functor defined as
■ for A ∈ Obj C, (G ◦ F )(A) = G (F (A));
■ for f ∈ HomC (A, B), (G ◦ F )(f ) = G (F (f )): G (F (A)) → G (F (B)).
It is immediate to check that functor composition is associative and
that the identity functor acts like a unit.

355 of 472
Functors IX

Definition 18.10 (Cat)


The category Cat has all small categories as objects, and all the
functor between them as arrows.

We cannot form a category of locally small, or large categories since it


would be too big, leading to an analogous of the Russell’s paradox.

356 of 472
Functors X

Definition 18.11 (Full and faithful functors)


Given a functor F : C → D, we say
■ F is faithful if F : HomC (A, B) → HomD (F (A), F (B)) is injective for
all A, B ∈ Obj C;
■ F is full if F : HomC (A, B) → HomD (F (A), F (B)) is surjective for all
A, B ∈ Obj C;
■ F is surjective if F : Obj C → Obj D is surjective;
■ F is essentially surjective if, for every D ∈ Obj D, there is a C ∈ C
such that B ∼= F (C ).

357 of 472
Functors XI

Example 18.12 (Inclusion functor)


A functor F : C → D is an inclusion functor, notation F : C ,→ D, if, for
all A ∈ Obj C, F (A) = A, and, for all f ∈ HomC (A, B), F (f ) = f .
If F is an inclusion functor, then C is a subcategory of D. Evidently, F
is faithful, so that the image of C through F is a copy of C inside D.
Also, if F is full, then C is a full subcategory of D, so that the image
of C through F is a copy of C inside D with all the arrows not in the
image going to or coming from an object not in the image.

358 of 472
Natural transformations I
Definition 18.13 (Natural transformation)
Let C and D be categories and let F , G : C → D be functors. A natural
.
transformation α from F to G , notation α : F −→ G , is a family of
arrows α : F (A) → G (A) A∈Obj C , indexed by the objects of C such
© ª

that, for any f ∈ HomC (A, B), the following diagram commutes in D:
αA
F (A) / G (A)

F (f ) G (f )
 
F (B) / G (B)
αB

The collection of all natural transformations from F to G is denoted


by Nat(F , G ).
359 of 472
Natural transformations II

Definition 18.14 (Natural isomorphism)


.
A natural transformation α : F −→ G is a natural isomorphism if each
component αA : F (A) → G (A) is an isomorphism. In this case, we
would write F ∼
= G saying that F and G are naturally isomorphic.

360 of 472
Natural transformations III

Definition 18.15 (Equivalence)


Two categories C and D are said to be equivalent, notation C ' D, if
there are functors F : C → D and G : D → C such that F ◦ G ∼
= IdD and
G ◦F ∼= IdC .

Theorem 18.16
A functor is part of an equivalence of categories iff it is full, faithful
and essentially surjective.
[Proof not required — It uses the Axiom of Choice]

361 of 472
Natural transformations IV

Example 18.17 (Identity transformation)


Let F : C → D be a functor, then ι = idF (A) A∈Obj C is an evident
© ª

natural transformation, called the identity transformation.

362 of 472
Natural transformations V

Definition 18.18 (Vertical composition)


Let C and D be categories, and let F , G , H : C → D be functors. If
. .
σ : F −→ G and τ : G −→ H are natural transformations, then
.
(τ ◦ σ): F −→ H is the natural transformation defined as
(τ ◦ σ)A = τA ◦ σA .

It is simple to check that the vertical composition is associative and


that the identity transformation is a unit for it.

363 of 472
Natural transformations VI
Definition 18.19 (Horizontal composition)
Let C, D and E be categories, and let S , T : C → D and S 0 , T 0 : D → E
. .
be functors. If σ : S −→ T and τ : S 0 −→ T 0 are natural
transformations, then the following diagram commutes:
τS(A)
(S 0 ◦ S)(A) / (T 0 ◦ S)(A)

S 0 (σA ) (σ•τ)A T 0 (σA )


 ' 
(S 0 ◦ T )(A) / (T 0 ◦ T )(A)
τT (A)

.
The horizontal composition (σ • τ): S 0 ◦ S −→ T 0 ◦ T is a natural
transformation defined as the diagonal of the above square:
(σ • τ)A = T 0 (τA ) ◦ τS(A) = τT (A) ◦ S 0 (σA ).
364 of 472
Natural transformations VII

Again, it is easy to show that the horizontal composition is associative


and that the identity transformation is a unit for it.

We will not use horizontal composition in this course, so when we


speak about composition of natural transformations, we really mean
vertical composition.

365 of 472
Natural transformations VIII
Example 18.20 (Evaluation)
Let C be a category with exponentiation, and let A ∈ Obj C. Then
FA : C → C defined as FA (B) = B A × A for each B ∈ Obj C, and
FA (f ) = 〈(f ◦ −), idA 〉 for each f ∈ HomC (B , C ), is a functor.
.
Thus, ev : FA −→ IdC , the evaluation transformation, is a natural
transformation, since the following diagram commutes for every
g : C → B:
evC
FA (C ) = C A × A / C = IdC (C )

FA (g )=〈(g ◦−),idA 〉 g =IdC (g )


 
FA (B) = B A × A evB / B = IdC (B)

366 of 472
Natural transformations IX

Definition 18.21 (Functor category)


Let C and D be categories, then DC , the functor category, is a
category whose objects are the functors C → D, and whose arrows are
the corresponding natural transformations, with vertical composition.

367 of 472
Natural transformations X

Theorem 18.22
The category Cat is Cartesian closed.

Proof.
The category 1 is a terminal object in Cat; it has binary products, so
it has all the finite products, as well; also, it has equalisers, as it is
easy to verify. Exponentiation is given by the functor category.

368 of 472
Natural transformations XI

Example 18.23 (Evaluation functor)


The functor Ev : DC × C → D, called the evaluation functor, is defined
as Ev(G , A) = G (A), with A ∈ Obj C and G : C → D, and
.
Ev(α, f ) = G (f ) ◦ αA = αB ◦ F (f ), with α : F −→ G , F , G : C → D and
f ∈ HomC (A, B).

The evaluation functor is the ev arrow of Cat.

369 of 472
References and Hints

This lesson roughly follows [Pierce2] Chapters 2.1 and 2.3.

Most examples comes from [Goldblatt], while definitions follows


[MacLane]. The omitted proof can be found in [MacLane].

370 of 472
Fundamentals of Functional Programming
Lecture 19 — Intermezzo

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

Functors and natural transformations are essential tools in the


theoretical analysis of categories. They allow to look at a given
category from outside, studying how the category relates with other
categories.

From a computational point of view, most canonical constructions are


naturally coded as functors. Representing functors as maps between
categories in some functional language adds a level of abstraction to
programming which is useful to model complex situations in a very
compact way.

372 of 472
Functors I
Functors, consisting as they do of two functions, one on objects, the
other on arrows, can be represented quite simply.
datatype (oA,aA,oB,aB)Functor =
functor (oA,aA)Cat × (oA → oB) × (aA → aB) × (oB,aB)Cat;

As before, we adopt the convention that a pair of maps is promoted to


an instance of the type Functor if and only if we have actually proved
that they form a functor.
Example 19.1 (Identity functor)
The identity functor, mapping objects and arrows to themselves, is
defined as
fun I(C ) = functor(C ,λx . x,λx . x,C );
373 of 472
Functors II

Example 19.2 (Product functor)


The functor P : FinSet → FinSet mapping A 7→ A × A and f 7→ 〈f , f 〉,
where 〈f , f 〉(x , y ) = (f x , f y ), is defined as
let fun cartprod(A,B) =
if A = ; then ;
else let { a } ∪ A0 = A;
in (mapset (λb. (a, b)) B) ∪ cartprod(A0 ,B);
fun prodarrow(setarrow(A,f ,B),setarrow(C ,g ,D)) =
setarrow(cartprod(A,C ),λ(x , y ). (f x , g y ), cartprod(B,D));
in functor(FinSet,λA. cartprod(A,A),λf . prodarrow(f ,f ),FinSet);

374 of 472
Natural transformations I
Natural transformations are easily coded as a datatype:
datatype (oA,aA,oB,aB)NatTransform =
nattransform (oA,aA,oB,aB)Functor ×
× (oA → aB) ×
× (oA,aA,oB,aB)Functor;

To simplify the basic definitions on natural transformations, we define


a couple of auxiliary functions:
fun ofo (functor(C ,fo,fa,D)) A = fo A;
fun ofa (functor(C ,fo,fa,D)) g = fa g ;

They define application of a functor to an object or an arrow.


375 of 472
Natural transformations II

In this way, we can define the identity of natural transformations:


fun id(A,cat(-,-,i,-)) F = nattransform (F ,λx . i(ofo F x),F );

Also, we can define vertical composition:


fun vcomp (A,cat(-,-,-,c))
(nattransform(-,β,H))
(nattransform(F ,α,-)) =
nattransform(F ,λx . c(β x,α x),H);

376 of 472
Natural transformations III

Similarly, we define horizontal composition:


fun hcomp (cat(-,-,-,c))
(nattransform(G ,β,G 0 ))
(nattransform(F ,α,F 0 )) =
nattransform(FunComp F G ,
λx . c(β (ofo F 0 x),ofa G (α x)),
FunComp G 0 F 0 );

where FunComp is functor composition ([Exercise] Define it!).

377 of 472
Natural transformations IV

With these instruments, we can construct the functor category which,


starting from two given categories A and B, has as objects the
functors from A to B, and the corresponding natural transformations
as arrows. In code
fun functorcategory(A,B) =
cat(λ nattransform(s,-,-). s,
λ nattransform(-,-,t). t,
id(A,B),
vcomp(A,B));

378 of 472
A Different Application I

As we did with colimits, it is possible to develop functions to code


functorial constructions of any sort and their arrows, which will be
natural transformations.

Instead, we want to use Category Theory to show that a software


engineering method that allows to merge experts’ evaluations in a risk
analysis is impossible to achieve. More specifically, we want to show
how to use the categorical framework, and functors in particular, to
show that it is impossible to construct a “most general” common
metric to measure risk in a system when two experts use different
metrics where some values are identified.

379 of 472
A Different Application II

We will not introduce and discuss all the preliminaries of this


application, but only the core of the result, where Category Theory
plays an essential role.

This application has been chosen because it is simple, without being a


toy example of the techniques it wants to suggest.

The presentation will first introduce some algebraic definitions, which


are used to describe what we intend for metric; then, we will state the
problem we want to solve; finally, we develop the necessary results to
show that the proposed problem is, indeed, unsolvable.

380 of 472
Metrics I

A metric is a structure whose values are used to assign a measure to


the risk a system or one of its parts are exposed to. Since these values
are combined using maxima and minima, it is natural to organise them
as an algebraic structure based on orders.

Definition 19.3 (Lattice)


A lattice is a partial order 〈O , ≤〉 such that every pair x , y ∈ O has a
unique least upper bound (lub), denoted by x ∨ y , and a unique
greatest lower bound (glb), denoted by x ∧ y .
A lattice is finite is O is a finite set.

381 of 472
Metrics II
Definition 19.4 (Complete lattice, bounded lattice)
Given a lattice 〈O , ≤〉, let U ⊆ O . Then U and U are, respectively,
W V

the lub and the glb of the elements in U, when they exist.
A lattice is complete if every subset U ⊆ O has a lub and a glb.
A lattice is bounded if there are two distinct elements > and ⊥ in O
such that ⊥ = O and > = O .
V W

In a bounded lattice, it is immediate to see that ; = ⊥ and ; = >,


W V

by duality.

Lemma 19.5
A finite lattice is bounded and complete.
[Proof not required]
382 of 472
Metrics III

Definition 19.6 (Metric)


A metric is a finite lattice.

Definition 19.7 (Met)


The category Met has all the metrics as objects and its arrows are the
functions f preserving the order, ⊥ and >, i.e., x ≤ y implies
f (x) ≤ f (y ), f (⊥) = ⊥ and f (>) = >.

It is easy to see that Met is a category: it is important to notice that


it is not the category of finite lattices, since its arrows are not
necessarily homomorphisms of lattices because they should not
preserve lubs and glbs.

383 of 472
The problem

The problem we would like to solve is: given two metrics A and B
where some values are identified via e1 : E → A and e2 : E → B, we
want to find the most general metric, up to isomorphisms, containing
both A and B where the elements e1 (x) and e2 (x) are identified for
each x ∈ E .

In categorical terms, this amounts to say that we want to find the


pushout of the diagram
e1 e2
Ao E /B

We will prove that such a pushout does not always exist, and we will
provide a way to construct it whenever it is possible.

384 of 472
The solution I

Lemma 19.8
Met has an initial object and binary coproducts.

Proof.
Let 0 = 〈{ ⊥, > }, ≤〉 with ⊥ ≤ >. Then 0 is a metric and it is obviously
initial.
Also, let A and B be metrics and define C = 〈A t⊥,> B , ≤〉 where
A t⊥,> B is the disjoint union of A and B with tops and bottoms
identified, and the order is naturally defined as the union of the orders
on A and B. Then C is a metric and it is immediate to show that the
embedding jA : A  C and jB : B  C are its injections, forcing C to
be the coproduct of A and B.

385 of 472
The solution II
Lemma 19.9
In a category having initial objects, binary coproducts and
coequalisers, every pushout is the coequaliser of a coproduct.
Proof.
Immediate from colimit construction.

In particular, if the following is a pushout diagram


e1
E /A
e2 pA
 
B /P
pB

jA ◦e1
then P is the coequaliser of the diagram E // A + B .
jB ◦e2
386 of 472
The solution III
Lemma 19.10
Let U : Met → Set be the forgetful functor. Then every pushout
g
B /P o C of the diagram B o
f
A / C in Met yields a
pushout U(B) / U(P) o U(C ) of the diagram
U(B) o U(A) / U(C ) in Set

Proof.
Elementary calculation.

In a simpler way, this lemma says that the forgetful functor


U : Met → Set preserves pushouts. It is extremely useful since,
pushouts in Set are easy to calculate: the pushout of U(B) and U(C )
is the object P = U(B) t U(C )/σ where σ is the minimal equivalence
relation such that f (x) = g (x) for all x ∈ U(A).
387 of 472
The solution IV

Consider the pushout diagram in Met


e1
E /A
e2 pA
 
B /P
pB

It is generated coequalising A + B in Met, because of Lemma 19.9.

Hence, it suffices to show that Met does not have all the coequalisers
to prove that Met has not all the pushouts, that is, our initial problem
is unsolvable.

388 of 472
The solution V
Consider the diagram
jA ◦e1
//
E A+B
jB ◦e2

Suppose that P is the coequaliser of the diagram. By Lemma 19.10,


P must be a finite lattice whose universe P =ªA t B/σ where
σ = (ρ ∪ ρ −1 )∗ and ρ = (e1 (x), e2 (x)): x ∈ E .
©

Also, the order of P is fully determined by knowing that P is a


coequaliser, in fact, if q : A + B → P preserves the order of A + B, it
must be the case that ≤P =≤A+B /σ, as elementary Algebra tells us.

Notice how the calculation of P is effective: if it exists, then we are


able to uniquely determine it.

389 of 472
The solution VI
To construct a counterexample, take as E the lattice
>=
 ==

α= β
== 


Also, take as A and B two copies of
>
c
 @@@

a=
==  b

d

390 of 472
The solution VII
Thus, A + B is

ll > RRRRRR
lll RR
lll
cA D cB
{ DD z DDD
{{ zz
aA A bA aB B bB
AA BB
||| |||
dA QQQ d
QQQ mmm B
QQ mmmmm

Let eY : E → Y , with Y either A or B, be


 > when x =>
x =α

 a when
Y
eY (x) =

 bY when x =β

 ⊥ when x =⊥ .
391 of 472
The solution VIII
So, the pushout is fully determined as the order
>?
??

· PPP nn ·
Pn
nnnPPP
· PnPP nn ·
Pn
nnnPPP
· ?n? ·
? 

which is not a lattice. Thus, there is no coequaliser for the diagram
jA ◦eA
//
E A+B
jB ◦eA

and, consequently, no pushout for the diagram


eB eA
Bo E /A
392 of 472
References and Hints

The content of the first part of this lesson has been taken from
[Rydeheard] Chapters 3 and 5.

The content of the second part is taken from M. Benini, S. Sicari, A


Mathematical Derivation of a Risk Assessment Procedure, IAENG
Journal of Applied Mathematics 40(2), pp. 52-62 (2010). In that
paper, there also the omitted proof.

393 of 472
Fundamentals of Functional Programming
Lecture 20

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

The notion of adjunction, we will explore in this lesson, is a deep and


complex concept. It arises everywhere, and its recognition leads to a
deeper understanding of theories, as well as a broad possibility to build
canonical constructions.

In this second aspect, constructions, adjoints are of particular interest


in Computer Science and, specifically, in functional programming,
since they allow for the solution of difficult problems whose “inverse
problems” are easy to describe.

With adjunctions, we will finish our development of Category Theory,


except for a few specific topics strictly related to the categorical
interpretation of λ-calculi.

395 of 472
Adjunctions I

Definition 20.1 (Adjunction)


An adjunction consists of
■ a pair of categories C and D;
■ a pair of functors F : C → D and G : D → C;
■ a natural isomorphism between the functors
HomD (F (−), −)), HomC (−, G (−)): Cop × D → Set, i.e., a family of
bijections HomD (F (A), B) ∼
= HomC (A, G (B)) natural in A ∈ Obj C
and B ∈ Obj D.
The functor F is said to be left adjoint to G , while G is said to be
right adjoint to F , and we write F a G .

396 of 472
Adjunctions II

The definition of adjunction is somewhat cryptic. The situation it


models is
F /
A F (A)
f g
 
G (B) o B
G

that is, whenever we make a move f : A → G (B) in C, we can replicate


the move in D, choosing a suitable (and unique) g : F (A) → B, and
vice versa.

397 of 472
Adjunctions III
The situation is formalised by naturality of

θA,B : HomD (F (A), B) ∼


= HomC (A, G (B)) .

The naturality in A ∈ Obj C means that, fixed B ∈ Obj D, the following


diagram commutes for every f ∈ HomC (A0 , A):

θ A, B
HomD (F (A), B) / HomC (A, G (B))

−◦F (f ) −◦f
 
HomD (F (A0 ), B) / HomC (A0 , G (B))
θ A0 , B

In informal terms, in C, whenever f : A0 → A and we make a move


A → G (B), then we can also make a move A0 → G (B), and, in both
cases, the moves can be replicated in D.
398 of 472
Adjunctions IV
The naturality in B ∈ Obj C is symmetric. Their combination produces
the naturality in both A and B.

It is simpler to dismount θA,B into two natural transformations.


Definition 20.2 (Unit and co-unit)
Given an adjunction 〈F , G , θ〉, its unit is
η A = θA,F (A) (idF (A) ): A → F (G (A)) ,

while its co-unit is


−1
²B = θG (B),B (idG (B) ): F (G (B)) → B .

The unit and the co-unit of an adjunction are strictly linked to each
other: their constructions from θA,B are symmetric, so we can focus
on unit, leaving the properties of co-unit to be derived by symmetry.
399 of 472
Adjunctions V
Lemma 20.3
Given an adjunction 〈F , G , θ〉, for each f ∈ HomD (F (A), B), it holds
that θA,B (f ) = G (f ) ◦ η A .

Proof. (i)
For each f ∈ HomD (F (A), B), the following diagram commutes, by
naturality of θ :

θA,F (A)
HomD (F (A), F (A)) / HomC (A, G (F (A)))

f ◦− G (f )◦−
 
HomD (F (A), B) / HomC (A, G (B))
θ A, B

,→
400 of 472
Adjunctions VI

,→ Proof. (ii)
Specialising it for idF (A) ∈ HomD (F (A), F (A)), we obtain

θA,F (A)
idF (A) / ηA

f ◦− G (f )◦−
 
f / θA,B (f ) = G (f ) ◦ η A
θA,B

401 of 472
Adjunctions VII

Lemma 20.4
Given an adjunction 〈F , G , θ〉, for each g ∈ HomC (A, G (B)), there is a
unique f ∈ HomD (F (A), B) such that g = G (f ) ◦ η A .

Proof.
Since θA,B is a bijection, to g must correspond a unique arrow
f ∈ HomD (F (A), B) such that g = θA,B (f ). So g = G (F ) ◦ η A by the
previous lemma.

For the co-unit ²B the same results are phrased as:


■ for each g ∈ HomC (A, G (B)), θA−,1B (g ) = ²B ◦ F (g );
■ for each f ∈ HomD (F (A), B), there is a unique g ∈ HomC (A, G (B)),
such that f = ²B ◦ F (g ).

402 of 472
Adjunctions VIII

Theorem 20.5
Let F : C → D and G : D → C be functors. If there are natural
. .
transformations η : IdC −→ G ◦ F and ² : F ◦ G −→ IdD such that, for
every f ∈ HomD (F (A), B), there is a unique g ∈ HomC (A, G (B)) such
that f = ²B ◦ F (g ), and for each g ∈ HomC (A, G (B)), there is a unique
f ∈ HomD (F (A), B) such that g = G (f ) ◦ η A , then 〈F , G , θ〉, with
θA,B (f ) = G (f ) ◦ η A , is an adjunction.

Proof.
Let τA,B (g ) = ²B ◦ F (g ) for every g ∈ HomC (A, G (B)). Then, it follows
that τA,B and θA,B are inverses to each other, so θA,B is an
isomorphism.
Also, it is immediate to show that it is natural in both A and B.

403 of 472
Adjunctions IX

The interest in adjunctions lies in their properties. For our purposes,


the main results are:
Theorem 20.6
Let C and D be categories. Then C ' D via the functor F : C → D iff F
is part of an adjunction 〈G , F , θ〉 with G a F such that its unit η and
co-unit ² are natural isomorphisms: η : IdD ∼ = G ◦ F and ² : F ◦ G ∼
= IdC .
[Proof not required]

Theorem 20.7
If F a G then G preserves limits, while F preserves colimits.
[Proof not required]

404 of 472
Adjunctions X

Example 20.8 (Initial and terminal objects)


Let C be a category and let !: C → 1 the only possible functor to the
category 1, having a single object, ∗, and a single morphism, id∗ . If
F a! for some functor F then C has an initial object.
In fact, consider an arrow g ∈ Hom1 (∗, !(A)): necessarily g = id∗ , and,
by adjunction, there is a unique arrow f : F (∗) → A in C, for each
A ∈ Obj C. Thus, F (∗) is an initial object of C. The unit of the
adjunction is η A = id∗ and the co-unit ²B is the unique arrow !: 0 → A,
with 0 = F (∗).
For similar reasons, if ! a G , then C has a terminal object.

405 of 472
Adjunctions XI
It can be shown that the limit and the colimit of any type of diagram
in a category C arise, when they exist, from right and left adjoints of a
diagonal functor C → CJ , where J is a canonical category having the
“shape” of the diagram.

The unit for the left adjoint is the universal co-cone, while the co-unit
of the right adjoint is the universal cone.

Example 20.9 (Product)


Consider the discrete category J with two objects, and define
∆ : C → CJ as ∆(A) = (A, A) and ∆(f : A → B) = 〈f , f 〉. Note that the
definition of ∆ uses the fact that CJ ∼
= C × C.
Then C has binary products iff ∆ has a right adjoint; moreover, C has
binary co-products iff ∆ has a left adjoint.
406 of 472
Adjunctions XII
When there is an equivalence or an isomorphism between categories or
collection of arrows, it is often the case that there is an adjunction.

Example 20.10 (Exponentiation)


If the category C has exponentiation, then

HomC (C × A, B) ∼
= HomC (C , B A ) .

Thus, the right product functor (− × A) has a right adjoint, the functor
(−A ).
The converse also holds: if the right product functor has a right
adjoint F , then the category has exponentiation and F (B) is the
exponential object, for each B ∈ Obj C.
Note that the co-unit ²B is precisely the evaluation arrow.
407 of 472
Adjunctions XIII

The forgetful functors offer a way to generate interesting objects: in


fact, usually the have a left adjoint, which produces free objects.

Example 20.11
Consider the forgetful functor U : Mon → Set. Its left adjoint F exists
and it is the free monoid F (A) generated by the set A of elements.

Note that we should always check that the forgetful functor admits a
left adjoint: for example, the category of fields has an obvious
forgetful functor to Set, but it does not have a left adjoint. In fact,
there is no such a thing as a “free field” in Algebra.

408 of 472
Yoneda Lemma I
When we consider functors from a locally small category to Set, there
is a nice result that allows to characterise their natural
transformations. It is also very useful, as we will see.

Theorem 20.12 (Yoneda lemma)


Let F : C → Set be a functor from a locally small category C, and let
A ∈ Obj C. Then, there is a bijective correspondence

θF ,A : Nat(Hom(A, −), F ) ∼
= F (A) .

Proof. (i)
.
For a given natural transformation α : Hom(A, −) −→ F , we define
θF ,A = αA (idA ). Moreover, given a ∈ F (A), for each B ∈ Obj C,
τ(a)B : HomC (A, B) → F (B) is defined as τ(a)B (f ) = F (f )(a), with
f ∈ HomC (A, B). ,→
409 of 472
Yoneda Lemma II
,→ Proof. (ii)
This class of mappings defines a natural transformation
.
τ(a): Hom(A, −) −→ F since, for every g ∈ HomC (B , C ) and
f ∈ HomC (A, B), (F (g ) ◦ τ(a)B )(f ) = F (g )(τ(a)B (f )) =
= F (g )(F (f )(a)) = (F (g ) ◦ F (f ))(a) = F (g ◦ f )(a) = τ(a)C (g ◦ f ) =
= τ(a)C (Hom(A, g )(f )) = (τ(a)C ◦ Hom(A, g ))(f ). In a diagram:

τ(a)B
HomC (A, B) / F (B)

Hom(A,g )=g ◦− F (g )
 
HomC (A, C ) / F (C )
τ(a)C

,→
410 of 472
Yoneda Lemma III

,→ Proof. (iii)
But, θF ,A and τ are inverses to each other. In fact, letting a ∈ F (A),
(θF ,A ◦ τ)(a) = θF ,A (τ(a)) = τ(a)A (idA ) = F (idA )(a) = idF (A) (a) = a.
.
Also, if α : Hom(A, −) −→ F and f ∈ HomC (A, B), (τ ◦ θF ,A )(αB (f )) =
= τ(θF ,A (α))B (f ) = τ(αA (idA ))B (f ) = F (f )(αA (idA )) =
= αB (Hom(A, f )(idA )) = αB (f ◦ idA ) = αB (f ).

Corollary 20.13
The bijections θF ,A of the Yoneda lemma are natural in A. Moreover,
if C is small, they are also natural in F .
[Proof not required]

411 of 472
Yoneda Lemma IV

Definition 20.14 (Yoneda functors)


The functor Y : Cop → SetC , defined as
■ for each A, B ∈ Obj C, (Y(A))(B) = HomC (A, B);
■ for each f ∈ HomC (A, B),
.
Y(f ) = Hom(f , −): HomC (B , −) −→ HomC (A, −);
is called the (contravariant) Yoneda functor.
Its dual is the (covariant) Yoneda functor: it sends each
.
f ∈ HomC (A, B) to Y(f ) = Hom(−, f ): HomC (−, A) −→ HomC (−, B).
The Y functor goes from C to the category of presheaves on C:
op
Y : C → SetC .

412 of 472
Yoneda Lemma V
Lemma 20.15
The Yoneda functors are full and faithful.

Proof.
Direct consequence of the Yoneda lemma.

The importance of the Yoneda functors lies in the fact that, given any
small category C, we can “complete” it. In fact, the image of C
through Y is an isomorphic copy of C, with no extra arrows, being Y
full and faithful.
op
But the category SetC is a topos, so it has all the finite limits and
co-limits, as well as exponentials and a subobject classifier. So, we can
op
think to C as a full subcategory of SetC , and we can work in the
larger category; in this way, we are “adding” to C the finite categorical
constructions it may lack.
413 of 472
References and Hints

This lesson comes from [Pierce2], Chapter 2.4.

The definition of adjunction is slightly different, although equivalent.


It comes from [Goldblatt]. The proofs of most theorems are expanded
versions of the ones in [MacLane].

The Yoneda Lemma is taken from [MacLane], while its proof can be
found in F. Borceux, Handbook of Categorical Algebra I: Basic
Category Theory, Cambridge University Press (1994). ISBN:
978-0521061193

All the omitted proofs can be found either in [MacLane] or in the


Borceux’s book.

414 of 472
Fundamentals of Functional Programming
Lecture 21 — Intermezzo

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

Adjunctions are extremely useful in the contemporary mathematical


practise. In Theoretical Computer Science, because of its proximity
with Mathematics, many applications have been developed.

In this lesson, we want to introduce a particular instance of


adjunctions: Galois connections. Essentially, a Galois connection is an
adjunction in the category of posets. It turns out that Galois
connections provide a simple yet elegant and powerful way to give a
formal semantics to imperative languages, which has been fruitfully
used to verify and synthesise programs from formal specifications.

416 of 472
Relational semantics I

Procedural (imperative) computer programs can be seen as operating


on a finite or infinite state space X whose elements are vectors of
which each component is of an appropriate datatype, these
components being the values which the program’s variables can take.

Fixed a program P, it will be helpful to distinguish between a set


G ⊆ X of initial states and a set M ⊆ X of final states.

Then, a non-deterministic program P is modelled by a relation


R ⊆ G × M: if R(x , y ) holds, then the program reaches the final state
y when starting from the initial state x. A deterministic program is
just a non-deterministic program represented by a function.

417 of 472
Relational semantics II
Definition 21.1 (Predicate)
Given a set X , a predicate (over X ) is a statement taking a value true,
denoted by >, or a value false, denoted by ⊥. Equivalently, a predicate
p is a function from X to { >, ⊥ }.
We write P(X ) for the set of predicates on X and order it by
implication: for p , q ∈ P(X ),
© ª © ª
p→q iff x ∈ X : p(x) = > ⊆ x ∈ X : q(x) = > .

So, 〈P(x), →〉 is a partial ordered set.

Define a map φ : P(X ) → ℘(X ) by φ(x) = x ∈ X : p(x) = > . Then φ


© ª

is an order isomorphism between 〈P(X ), →〉 and 〈℘(X ), ⊆〉, the


collection of subsets of X ordered by inclusion.
418 of 472
Relational semantics III

We think of a subset Y ⊆ M, the set of final states of a program P, as


specifying a property of final states true precisely on Y . These
properties are called postconditions.

Similarly, a predicate on initial states is called a precondition.

Note how we use the previously defined φ isomorphism to move from


subsets to predicates and vice versa. Since φ is an isomorphism, we
will never make its use explicit in the future.

419 of 472
Weakest precondition I

We may stipulate what a program should do by giving conditions on


the state of the system before and after program’s execution.

Now, fix G and M, initial and final states, and consider a relation
R ∈ ℘(G × M) modelling a program P, possibly non-deterministic but
always terminating.

For a given postcondition Y , the weakest precondition wpR (Y ) is the


set of input states x such that P is guaranteed to terminate in a state
in Y when it is started from x.
We may regard wpR as a map from P(M) to P(G ).

420 of 472
Weakest precondition II
The map wpR preservesª implication:
© ©
in fact, if Yª ⊆ Z then wpR (Y ) =
x ∈ X : ∃y ∈ Y . R(x , y ) ⊆ x ∈ X : ∃z ∈ Z . R(x , y ) = wpR (Z ).

Any map f : P(M) → P(G ), where G , M ⊆ X , preserving implication is


called a predicate transformer.

Given a predicate transformer τ : P(M) → P(G ), we may associate a


relation Rτ to it by

Rτ (x , y ) ≡ ∀Y ∈ ℘(M). x ∈ τ(Y ) → y ∈ Y ,

for any x ∈ G and y ∈ M,

It is immediate to verify ([Exercise] do it!) that, for R ∈ ℘(G × M) and


τ : P(M) → P(G ),
R ⊆ Rτ iff τ → wpR .
421 of 472
Weakest precondition III

Taking these ideas one step further, the program itself may be
modelled by a predicate transformer, namely its weakest precondition.

Thus, we can move backward and forward between the relational and
the predicate transformer semantics by means of the connection
R ⊆ Rτ iff τ → wpR .

We are not interested in developing any further the relational


semantics of programs, or the predicate transformers semantics in this
course. But, we wish to show that the above connection is, indeed, an
adjunction.

422 of 472
Galois connections I

Definition 21.2 (Galois connection)


Let P and Q be ordered sets. A pair (B, C) of maps B: P → Q and
C: Q → P, is a Galois connection if, for all p ∈ P and q ∈ Q

Bp ≤ q iff p ≤ Cq .

The map B is called the lower adjoint of C, and the map C is called
the upper adjoint of B.

423 of 472
Galois connections II

Any partially ordered set P can be formalised as a category P whose


objects are the elements of P and whose arrows p → q mean that
p ≤ q.

Let P and Q be two posets and let f : P → Q be order preserving, that


is, if a ≤P b then f (a) ≤Q f (b).
In the categorical setting, f induces a functor f from P to Q . In fact,
it maps objects of P into objects of Q , and arrows in HomP (a, b)
into arrows in HomQ (f (a), f (b)). Moreover, it trivially preserves
composition of arrows and identities.

424 of 472
Galois connections III

Hence, a Galois connection (B, C) between P and Q is a pair of


functors B: P → Q , C: Q → P such that

HomQ (Bp , q) ∼
= HomP (p , Cq) ,

for all p ∈ P and q ∈ Q.


It is simple to verify that this bijection is natural in both p and q.

So, by definition, a Galois connection (B, C) is an adjunction B a C


between the categories P and Q .

425 of 472
Galois connections IV
Lemma 21.3
Assume (B, C) is a Galois connection between ordered sets P and Q.
Let p , p1 , p2 ∈ P and q , q1 , q2 ∈ Q. Then
1. p ≤ C ◦ Bp and B ◦ Cq ≤ q;
2. p1 ≤ p2 implies Bp1 ≤ Bp2 and q1 ≤ q2 implies Cq1 ≤ Cq2 ;
3. Bp = B ◦ C ◦ Bp and Cq = C ◦ B ◦ Cq.
Conversely, a pair of maps B: P → Q and C: Q → P satisfying (1) and
(2) for all p , p1 , p2 ∈ P and for all q , q1 , q2 ∈ Q sets up a Galois
connection between P and Q.

Proof. (i)
For p ∈ P, we have Bp ≤ Bp by reflexivity so, being (B, C) a Galois
connection, p ≤ C ◦ Bp. Dually for B ◦ Cq ≤ q. This establishes (1). ,→
426 of 472
Galois connections V

,→ Proof. (ii)
For (2), consider p1 ≤ p2 , then p1 ≤ C ◦ Bp2 by (1) and transitivity,
which is equivalent to Bp1 ≤ Bp2 , being (B, C) a Galois connection.
Dually for q1 ≤ q2 .
For (3), from (1), p ≤ C ◦ Bp, we obtain Bp ≤ B ◦ C ◦ Bp by (2). But
(B, C) is a Galois connection, so C ◦ Bp ≤ C ◦ Bp, which holds by
reflexivity, implies B ◦ C ◦ Bp ≤ Bp. Thus, Bp = B ◦ C ◦ Bp by
symmetry. Dually for q.
Lastly, assume (1) and (2) hold universally. Let Bp ≤ q. By (2),
C ◦ Bp ≤ Cq, but p ≤ C ◦ Bp by (1), so p ≤ Cq by transitivity. The
reverse implication follows in the same way.

427 of 472
Galois connections VI
Definition 21.4
Let φ : P → Q be a map between ordered sets. We say that φ
preserves existing joins if whenever S exists in P for some S ⊆ P,
W

then φ(S) exists in Q and φ( S) = φ(S). Preservation of existing


W W W

meets is defined dually.

Lemma 21.5
Let (B, C) be a Galois connection between P and Q. Then B
preserves existing joins and C preserves existing meets.

Proof.
Since a join is a categorical colimit, namely the coproduct of the
objects in S, and a meet is a limit, namely the product of the objects
in S, then the left adjoint C preserves limits and the right adjoint B
preserves colimits when they exist, as for Theorem 20.7.
428 of 472
Refinement I

The development of a computer program often starts from a


specification of the task the program is to perform. In practise, the
transformation from specifications to executable programs will be
carried out in stages.

The objective is to do this by applying fixed rules which guarantee


correctness at each step.

This process is known as stepwise refinement.

429 of 472
Refinement II

A suitable setting in which to work is provided by a specification space


S = 〈S , v〉 of commands, relative to some fixed state space X . This
space consists of a chosen imperative programming language
augmented by specification, that is, descriptions of computations not
cast in an executable form.

It is worthwhile to include in S commands such as magic, which


miraculously meets every specification, that are far removed from code
for feasible specifications: magic provides a top for the order S.

430 of 472
Refinement III

The admittance of commands for arbitrary non-deterministic choice


corresponds to the existence of arbitrary meets in S. So, S becomes a
complete lattice.

The presence of such a strong mathematical structure in the


specification space means that the full power of the theory of Galois
connections is available: in particular, maps from S to S preserving
arbitrary meets (joins) possess lower (upper) adjoints.

The existence of such adjoints guarantees the existence of commands


that may assist in program development. Furthermore, the
calculational rules obeyed by Galois connections will supply laws
governing these commands.

431 of 472
Refinement IV

The v relation denotes the idea of “is refined by”. It is clearly reflexive
and transitive, and by a suitable quotient of the specification space, it
becomes also anti-symmetric, when we identify specifications with the
same end result.

The instances in which refinement is most productive are those in


which one ordered structure 〈A, ≤〉 is refined by another 〈C , ≤〉 and
where there exists a Galois connection (B, C) between A (abstract)
and C (concrete).

432 of 472
Refinement V

We think of the orders on A and C as having the interpretation “is less


informative than”, so that x ≤ y means that y serves every purpose
that x does.

Assume that a programming command is described by c in C and


suppose we wish to show that c implements some specification a in
the more abstract level A.

We can either prove that Ba ≤ c in the concrete model, or,


equivalently, prove that a ≤ Cc in the abstract model.

433 of 472
Refinement VI

We have that B ◦ Cc ≤ c for any c ∈ C : this expresses the fact that, in


general, abstraction results in a loss of information.

In the context of relational and predicate transformer models for


imperative programs, such Galois maps are provided by the map
taking a relation to the associated weakest precondition transformer
and its upper adjoint.

This Galois connection can be extended from models of programs to


models of specifications to yield a powerful refinement calculus.

434 of 472
References and Hints

This lesson is taken from Chapter 7 of B.A. Davey, H.A. Priestly


Introduction to Lattices and Order, 2nd edition, Cambridge University
Press (2002). ISBN: 0521784514. This very readable text covers the
essential aspects of the algebra of orders and lattices in a clear and
understandable style with many examples devoted to computer
scientists.

To whom is interested, the idea of using the weakest precondition to


model programs and specifications dates back to E.W. Dijkstra, and is
explained in his book A discipline of programming, Prentice Hall
(1976). ISBN: 013215871X.

435 of 472
Fundamentals of Functional Programming
Lecture 22

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

The purpose of this lesson is to show a correspondence between


Cartesian closed categories and a typed λ-calculus. In this way, we can
associate to each Cartesian closed category a λ-theory and, vice versa,
we can interpret any λ-theory in a suitable Cartesian closed category.

Thus, Cartesian closed categories are really the models of our


λ-calculus, and the interpretation is both natural and complete.

Similar results can be obtained for all the typed λ-calculi we


considered, but the technical aspects are more involved, so we omit
them in the present course.

437 of 472
Typed λ-calculus I

First, we define the typed λ-calculus we want to model.

Definition 22.1 (Types)


Given a set K of type constant, types are defined as
■ each k ∈ K is a type;
■ 1 is a type, informally denoting the empty product;
■ if A and B are types, then so is A × B;
■ if A and B are types, then so is A → B.

So, our λ-calculus is the simple theory of types extended with explicit
product types.

438 of 472
Typed λ-calculus II
Definition 22.2 (Terms)
Given a class of types, a set V of typed variables such that there is a
denumerable quantity of them for each type, and a set F of typed
constants, terms are defined as
■ if x : A ∈ V then x : A and FV(x) = { x };
■ if f : A → B ∈ F and t : A is a term, then f (t) : B and
FV(f (t)) = FV(t);
■ ∗ : 1 is a term and FV(∗) = ;;
■ if s : A and t : B are terms, then 〈s , t 〉 : A × B is a term and
FV(〈s , t 〉) = FV(s) ∪ FV(t);
■ if t : A × B is a term then fst(t) : A and snd(t) : B are terms and
FV(fst(t)) = FV(snd(t)) = FV(t);
,→
439 of 472
Typed λ-calculus III

,→ (Terms)
■ if t : B is a term and x : A ∈ V then (λx : A. t) : A → B is a term and
FV(λx : A. t) = FV(t) \ { x };
■ if s : A → B and t : A are terms, then (s t) : B is a term and
FV(s t) = FV(s) ∪ FV(t).

There is a subtlety in the definition: for formal purposes, which will be


clear later, it is useful to distinguish between application of a function
symbol to a term, and application of a term to a term.

440 of 472
Typed λ-calculus IV

Definition 22.3 (Equality)


The only formulae we allow in the formation of a λ-theory, which is a
set of formulae, are equalities in a context: X . t =A s, where t : A and
s : A are terms, and the context X ⊆ V is a finite set of variables such
that FV(t) ∪ FV(s) ⊆ X .

Equalities are the axioms of a λ-theory. Often, we do not write


contexts explicitly: in those cases, it is assumed to be the canonical
one, that is, FV(t) ∪ FV(s).

441 of 472
Typed λ-calculus V
Definition 22.4 (Calculus)
The rules of inference of the λ-calculus are:

X .s = t
(subst1) ;
Y . s[r /x] = t[r /x]

(refl) ;
X .x = x

X .x = y
(sym) ;
X .y = x

X .x = y X .y = z
(trans) ;
X .x = z
,→
442 of 472
Typed λ-calculus VI
,→ (Calculus)

X .s = t
(subst2) ;
Y . r [s/x] = r [t/x]

(unit) ;
X . x =1 ∗

(fst) ;
X . fst(〈x , y 〉) = x

(snd) ;
X . snd(〈x , y 〉) = y

(pair) ;
X . 〈fst(z), snd(z)〉 = z
,→
443 of 472
Typed λ-calculus VI

,→ (Calculus)

(β) ,
X . (λy : A. s) t = s[t/y ]
where y 6∈ FV(t);

(η) ,
X . λy : A. t y = t
where y 6∈ FV(t);

X , y : A. s =B t
(λ) .
X . (λy : A. s) = (λy : A. t)

444 of 472
Interpretation I
Definition 22.5 (λ-structure)
Let C be a Cartesian closed category. A λ-structure M in C is defined
by
■ a function M : K → Obj C mapping the type constants to objects of
C;
■ a function M from F , the set of function symbols, to the arrows of
C such that M(f : A → B): M A → M B.
The function M : K → Obj C is extended to arbitrary types as
■ M 1 = 1C , the terminal object of C;
■ M(A × B) = M A ×C M B, the product of C;
■ M(A → B) = (M B)M A , the exponential object of C.
We omit subscripts, for clarity.
445 of 472
Interpretation II
Definition 22.6 (Interpretation)
If M is a λ-structure in a Cartesian closed category C, we assign to
each term in a context X . t : B an interpretation

[[X . t]]M : M A1 × · · · × M An → M B

where X = { x1 : A1 , . . . , xn : An }, and M A abbreviates M A1 × · · · × M An


in the following way:
■ if t ≡ xi then [[X . t]] = πi , the i-th projection;
■ if t ≡ f (t 0 ) then [[X . t]] = M f ◦ [[X . t 0 ]];
■ if t ≡ ∗ then [[X . t]] is the unique morphism M A → M B;
■ if t ≡ 〈t 0 , t 00 〉 and B ≡ B1 × B2 then
[[X . t]] = 〈[[X . t 0 ]], [[X . t 00 ]]〉 : M A → M B1 × M B2 ;
,→
446 of 472
Interpretation III
,→ (Interpretation)
■ if t ≡ fst(t 0 ) then [[X . t]] = π1 ◦ [[X . t 0 ]];
■ if t ≡ snd(t 0 ) then [[X . t]] = π2 ◦ [[X . t 0 ]];
■ if t ≡ λx : C . t 0 , where we assume z : C 6∈ X , and B ≡ C → D then
[[X . t]] = h with h : M A → (M B)M C defined as the exponential
transpose in the following evaluation diagram:

(M B)M C × M C
NNN O
NNev
NNN
〈h,idM C 〉
NNN
&
M A×M C /D ;
0
[[X ,z:A.t ]]

,→

447 of 472
Interpretation IV

,→ (Interpretation)
■ if t ≡ t 0 t 00 then [[X . t]] = ev ◦〈[[X . t 0 ]], [[X . t 00 ]]〉.
Moreover, [[X . t =A t 0 ]] is the equaliser

[[X . t]]
//
[[X . t =A t 0 ]] / / MA MB .
[[X . t 0 ]]

448 of 472
Interpretation V

Definition 22.7 (Validity)


An equality X . s =A t is valid in a λ-structure M in a Cartesian closed
category C iff [[X . s =A t]] = idM A1 ×···×M An with X = { x1 :A1 , . . . , xn :An }.

Definition 22.8 (Model)


A λ-structure M in a Cartesian closed category C is a model for a
λ-theory T iff each axiom X . t =A s in T is valid in M.

449 of 472
Soundness
Theorem 22.9 (Soundness)
If X . s =A t is derivable in a λ-theory T , then it is valid in all models
for T in every Cartesian closed category.
Proof.
We need to check that the rules of the λ-calculus preserve validity:
■ the axioms (refl), (sym), (unit) and the rule (trans) are obvious;
■ rules (subst1) and (subst2) are proved to preserve validity by a
trivial induction which shows that [[X : t[s/y ]] = [[Y . t]] ◦ [[X . s]];
■ rules (fst), (snd) and (pair) are straightforward from the definition
of interpretation and the properties of product;
■ rules (β) and (η) are straightforward from the definition of
interpretation and the properties of evaluation.
[Exercise] Fill the details.
450 of 472
Completeness I
Definition 22.10 (Syntactic category)
Let T be a λ-theory. We define a category CT as follows:
■ the objects of CT are the types of the language of T ;
■ the arrows of CT are equivalence classes [x : A. t] of terms in
contexts where [x . s] = [x . t] iff x . t = s is provable in T , or
[x . t] = [y .t[y /x]]. The substitution and equality rules ensure that
this definition does not depend on the choice of t;
■ the identity morphism is [x . x]: A → A;
■ composition is given by substitution: given [x . t]: A → B and
[y . s]: B → C , [y . s] ◦ [x . t] = [z .s[t/y ]].

Note that we do not need contexts with more that one variable,
having product types.
451 of 472
Completeness II
Lemma 22.11
The category CT is Cartesian closed.

Proof.
■ the terminal object is the type 1 and, for each object A, the unique
arrow A → 1 is [x : A. ∗];
■ the product of A and B is the type A × B, with projections
[z . fst(z)] and [z . snd(z)], and the morphism C → A × B induced by
[w . s]: C → A and [w . t]: C → B is [w . 〈s , t 〉];
■ the exponential B A is the type A → B, with evaluation map
(A → B) × A → B defined as [w . fst(w ) snd(w )], and, given any
[z . t]: C × A → B, its exponential transpose C → (A → B) is
[w . λx : A. t[〈w , x 〉/z]].

452 of 472
Completeness III

Theorem 22.12 (Completeness)


The Cartesian closed category CT contains a λ-structure M which
validates exactly the equalities derivable form the λ-theory T .
Moreover, for any Cartesian closed category D, there is a bijection
between natural isomorphisms classes of Cartesian closed functors
CT → D and isomorphisms classes of T -models in D.

Proof. (i)
The structure MT sends types to themselves and each primitive
function symbol f : A → B to [x : A. f (x)]. By an easy induction we get
that [[x . t]]MT = [x .t]. Hence, the equalities in a context valid in MT
are exactly those provable in T . ,→

453 of 472
Completeness IV

,→ Proof. (ii)
Given a model N in D, the corresponding functor FN : CT → D sends A
to N A for each type A, and [x . t] to [[x . t]]N . It is clear that FN is a
Cartesian closed functor and that FN (MT ) = N.
In the opposite direction, since any Cartesian closed functor
F : CT → D must preserve interpretations of arbitrary terms in a
context, it is easily seen to be naturally isomorphic to FN where
N = F (MT ).

454 of 472
References and Hints

This lecture has been taken from [Pierce2] Chapter 3.1.

Theorems and proofs follow the treatment as given in Chapter D4.2 of


P. Johnstone, Sketches of an Elephant: A Topos Theory Compendium
volume 2, Oxford University Press (2002). ISBN: 978-0198515982

Johnstone’s text is the last reference work on Topos Theory and it


contains almost everything which is known on this fascinating topic. It
is also a very difficult book, although very well written, in a
crystal-clear style, requiring a mature mathematical attitude before
approaching.

455 of 472
Fundamentals of Functional Programming
Lecture 23

Prof. M. Benini
[Link]@[Link]
[Link]

Laurea Magistrale in Informatica


Facoltà di Scienze [Link]. di Varese
Università degli Studi dell’Insubria

a.a. 2010/11
Outline

This lesson wants to introduce a class of models for the λ-calculus in


its pure version.

As for typed λ-calculus, the models we will consider are of a


categorical nature, and we will prove that they are a sound and
complete semantics for the λ-calculus.

With respect to the typed λ-calculus, in the pure version we will follow
an algebraic approach instead of using a purely categorical
construction. This way is less abstract, but slightly more difficult, at
least from the technical point of view.

457 of 472
C-monoids I
The categorical structures that we will use as models, are called
C-monoids. Intuitively, they are Cartesian closed categories deprived of
the terminal object.

Definition 23.1 (C-monoid)


A C-monoid is an algebraic structure 〈M , ·, 1, π1 , π2 , ², ∗, 〈〉〉 where
〈M , ·, 1〉 is a monoid, π1 , π2 , ² ∈ M, (−)∗ is a unary operation on M,
and 〈−, −〉 is a binary operation on M. These operations are required
to satisfy the following axioms for all a, b, c, h and k in M:
■ π1 · 〈a, b 〉 = a and π2 · 〈a, b 〉 = b;
■ 〈π1 · c , π2 · c 〉 = c;
■ ² · 〈h∗ · π1 , π2 〉 = h;
■ (² · 〈k · π1 , π2 〉)∗ = k.
458 of 472
C-monoids II

The intuition behind the definition is that 〈−, −〉, π1 and π2 define the
product with its projections, while ² stands for a pairing operation,
precisely ² = λz . 〈fst z , snd z 〉, and (−)∗ stands for functional
application, precisely h∗ = λx , y . h 〈x , y 〉.

The relation with Cartesian closed categories can be recovered by


thinking to C-monoids as monoids, i.e., categories with a single object,
having some constraints on arrows: it is possible to construct a
Cartesian closed category by using the C-monoid arrows as objects and
defining an appropriate notion of arrows. We will not explore this
connection, as it does not shade light to the semantics of λ-calculus.

459 of 472
C-monoids III
Definition 23.2 (Product and exponential)
In any C-monoid, a × b ≡ 〈a · π1 , b · π2 〉 and g f ≡ (g · ² · 〈π1 , f · π2 〉)∗ .

In a monoid, the product · allows to construct sequences of elements,


called words. If we add a variable standing for elements, we get
Definition 23.3 (Polynomials)
Given a C-monoid M , we construct another C-monoid M [x]: the
elements of M [x] are polynomials, i.e., words built up from the symbol
x and the elements of M using the C-monoid operations, modulo the
smallest congruence relation which satisfies the axioms of C-monoids.

The structure of M [x] is determined by h : M → M [x] which maps


every element of M into itself, thought to as a constant polynomial of
M [x].
460 of 472
C-monoids IV
Theorem 23.4 (Functional completeness)
If φ(x) is a polynomial in the variable x over a C-monoid M , there
exists a unique f in M such that f · 〈(x · π1 )∗ , 1〉 = φ(x) in M [x].

Proof. (i)
Define ρ x · φ(x) by induction on the structure of φ(x):
■ ρ x · k ≡ k · π2 if k is an element of M ;
■ ρ x · x ≡ ²;
■ ρ x · (ξ(x) · ψ(x)) ≡ ρ x · ξ(x) · 〈π1 , ρ x · ψ(x)〉;
■ ρ x · 〈ψ(x), ξ(x)〉 ≡ 〈ρ x · ψ(x), ρ x · ξ(x)〉;
■ ρ x · ψ(x)∗ ≡ (ρ x · ψ(x) · α)∗ , where α ≡ 〈π1 · π1 , 〈π1 · π1 , π2 〉〉.
First, we show that, if φ(x) = ψ(x), then ρ x · φ(x) = ρ x · ψ(x). ,→

461 of 472
C-monoids V

,→ Proof. (ii)
It suffices to prove the fact for the C-monoid axioms since = is the
smallest congruence relation satisfying them. But, if A = B is a
C-monoid axiom, it is easy to check that ρ x · A = ρ x · B.
Also, ρ x · φ(x) · 〈(x · π2 )∗ , 1〉 = φ(x) by direct calculation.
So, ρ x · φ(x) satisfies the statement of the Theorem.
Finally, suppose f · 〈(x · π2 )∗ , 1〉 = φ(x) for some constant f of M , then
ρ x · φ(x) = ρ x · f · 〈(x · π2 )∗ , 1〉 = f · π2 · 〈(x · π2 )∗ , 1〉 = f · 1 = f .
So ρ · φ(x) is unique, as required.

462 of 472
C-monoids VI

Definition 23.5 (Application and abstraction)


The application of g to a, elements of a C-monoid M , is defined as
g • a ≡ ² · 〈g · (α · π2 )∗ , 1〉.
The abstraction of a variable x from a polynomial φ(x) over M is
defined as λx . φ(x) ≡ (ρ x · φ(x))∗ .

Corollary 23.6
If φ(x) is a polynomial in the variable x over a C-monoid M , then
there exists a unique g in M such that g • x = φ(x).

Proof.
Take g = λx . φ(x).

463 of 472
λ-calculus I

Definition 23.7 (Terms)


Given a denumerable set V of variables, and a possibly empty set K of
constants, the set of terms is freely generated by the following rules:
■ if x ∈ V then x is a term and FV(x) = { x };
■ if k ∈ K then k is a term and FV(k) = ;;
■ if f and t are terms, then so is (f t) and FV(f t) = FV(f ) ∪ FV(t);
■ if a and b are terms, then so is 〈a, b〉 and
FV(〈a, b〉) = FV(a) ∪ FV(b);
■ if t is a term, then so are fst(t) and snd(t), and
FV(fst(t)) = FV(snd(t)) = FV(t);
■ if t is a term and x ∈ V , then (λx . t) is a term and
FV(λx . t) = FV(t) \ { x }.
464 of 472
λ-calculus II

Definition 23.8 (Equality)


Equality between terms is defined as usual, as the congruence relation
between terms, i.e., an equivalence relation which is stable under
substitution of equals with equals, which extends α-equivalence,
β-equality, η-equality and the usual rules for product and projections.

An important note is that product is definable inside λ-calculus, as we


have seen, but, as proved by H.P. Barendregt, it is impossible to give a
definition which satisfies surjective pairing, that is, 〈fst(t), snd(t)〉 = t.

Since surjective pairing is a natural property when working with


products, it is usually incorporated in the definition of terms.

465 of 472
λ-calculus III

Lemma 23.9
Every C-monoid M gives rise to a λ-calculus L(M ).

Proof.
The constants of L(M ) are the elements of M, variables are chosen
from a denumerable set V , and the terms of L(M ) are constructed as
follows
■ fst(t) ≡ π1 · t and snd(t) ≡ π2 · t;
■ 〈a, b 〉 ≡ 〈λx . a, λx . b 〉 • 1, where x 6∈ FV(a) ∪ FV(b);
■ application and abstraction are defined as before.
Finally, a = b holds in L(M ) iff a = b holds in M [FV(a) ∪ FV(b)].
It is immediate to show that L(M ) is a λ-calculus from the properties
of C-monoids.
466 of 472
λ-calculus IV
Lemma 23.10
Every λ-calculus L gives rise to a C-monoid M(L ).

Proof. (i)
The elements of M(L ) are equivalence classes of equivalent closed
terms of L ,and
■ 1 ≡ λx . x;
■ g · f ≡ λx . g (f x);
■ π1 = λx . fst(x) and π2 = λx . snd(x);
■ 〈f , g 〉 ≡ λx . 〈f x , g x 〉;
■ ² ≡ λz . 〈fst(z), snd(z)〉;
■ h∗ ≡ λx , y . h 〈x , y 〉.
Moreover, a = b in M(L ) iff a = b holds in L . ,→
467 of 472
λ-calculus V

,→ Proof. (ii)
The axioms for a C-monoid as easily checked. For example:
² · 〈h∗ · π1 , π2 〉 · x = ² · 〈h∗ · π1 · x , π2 · x 〉 = 〈h∗ · fst(x), snd(x)〉 =
h · 〈fst(x), snd(x)〉 = h · x, so ² · 〈h∗ · π1 , π2 〉 = h.
The other axioms are left as exercises.

468 of 472
λ-calculus VI

Theorem 23.11
The maps M and L of the previous lemmas establish a one-to-one
correspondence between C-monoids M and λ-calculi L :
M ◦ L(M ) = M and L ◦ M(L ) = L .
[Proof not required]

469 of 472
λ-calculus VII
Definition 23.12 (CMon)
The category CMon has C-monoids as objects and C-homomorphisms
as arrows. A C-homomorphism is a monoid homomorphism preserving
the C-monoid structure.

Definition 23.13 (λ-Calc)


The category λ-Calc has λ-calculi as objects and translations as
arrows. A translation is a mapping sending variables to variables and
closed term to closed terms, such that preserves the term forming
operations and the axioms.

Corollary 23.14
The category CMon is isomorphic to the category λ-Calc.
[Proof not required]
470 of 472
References and Hints

The material of this lecture is taken from [Lambek], Chapter I.15 and
I.17.

In that book, the interested reader may find the omitted proof.

471 of 472
Conclusion
This lecture is the last one in this course.

Who has been interested in the subjects developed in this course, may
consider to do her/his dissertation on them: the lecturer is doing
active research on some of these themes. If you just want to deepen
the subject, ask the lecturer: he may point you books, articles and
other references which may satisfy your interests.

If someone would like to develop her/his dissertation abroad, in


another European university, on these themes, ask the lecturer: he has
direct contacts with some of the researchers in the centres where
research on these subjects is pursued.

The End
472 of 472

You might also like