Functional Programming
Haskell
Haskell is a functional programming language with lazy evaluation that is a statically typed.
Data Types
Data Type Values
Bool True, False
Integer … -2 -1 0 1 2 …
Int -9223372036854775808 … 9223372036854775807
Float, Double 0.1 0.457 2.56e4 -1.2e-6
Char ‘a’ ‘b’ … ‘A’ ‘B’ … ‘0’ ‘1’ …
String “Arnold Rimmer” “Dave Lister”
Functions also have types that take the following format:
{Input Type} -> {Output Type}
Functions
A function definition has two parts:
Guards are a convenient notation for conditional evaluation e.g.,
Function definitions may be recursive e.g.,
Functions can also take more than one argument e.g.,
The type of the function above will be:
This approach to functions with multiple arguments is called Currying.
Lists
In Haskell, a list [x,y,z,…] with elements of type a has type [a]. In Haskell, a String is defined
as a list of characters [Char].
We can construct a list directly (e.g., [x,y,z,…]), or indirectly (using <head>:<tail> where
head is the first element of the list, and tail is the rest of the elements) e.g.,
We can access the head or tail of a list by using the head and tail functions e.g.,
The (:) operator decomposes a list into a head and a tail (a pattern), which can be useful in some
of the following ways:
or
or
Throwing Exceptions
To throw an exception in Haskell, we can use the error keyword e.g.,
Wildcards
If we only need part of a pattern, we can use a wildcard (_) e.g.,
or
Pairs
We can group two objects with similar or distinct types together in a pair e.g.,
We can retrieve the first and second elements in the pair using the fst and snd functions.
We can also use patterns on pairs in the same way as lists e.g.,
or
It is also possible to have larger tuples (in general 𝑛-tuples for 𝑛 = 2,3,4,5, … ,15).
Where Clauses
Where clauses can be used to define auxiliary functions or to name intermediate calculations that
are used in multiple places e.g.,
Accumulators
In functional programming, the idea of using an additional argument is common, called an
accumulator e.g.,
The map Function
The map function takes two arguments: a function and a list. It then applies the function to each
argument of the list.
This function is an example of a higher-order function: a function which takes another function as an
argument.
The filter Function
Another useful higher-order function is the filter function. This function also takes a function
and a list, and returns any elements of the list for which the function is true.
Argument p is the predicate.
The zip Function
The zip function takes two lists and combines them.
A variation of this function is the zipWith function, a higher-order function which takes a function
and two lists, and combines the lists using the function.
List Comprehension
List comprehensions are another way to work with lists:
This is equivalent to the following set comprehension:
{3𝑛 + 1 | 𝑛 ∈ ℕ, 𝑛 is even}
It is also possible to draw elements from more than one list or use more than one filter:
The data Keyword
The data keyword creates new types e.g.,
where:
• TwoStrings is the new type.
• TwoStr is a constructor.
• The pattern for the new type is TwoStr s1 s2 where s1 and s2 are Strings.
A constructor takes a fixed number of arguments of a fixed type. Its name must be unique (which
allows Haskell to infer the type) e.g.,
This new type has two patterns:
Data types may also contain variables e.g.,
They may also be recursive e.g.,
Maybe
Haskell has a built-in type Maybe a, defined as follows:
Either
Haskell has another built-in type Either a b, defined as follows:
The flip Function
In Haskell, many functions are not symmetric. Sometimes, we may want to give only the second
argument. There are three ways we could do this:
1. Writing the function as infix (e.g., (‘mod‘ 3))
2. Writing a lambda function (e.g., \x -> mod x 3)
3. Using the built-in flip function (e.g., \x -> flip mod 3)
Function Composition
In Haskell, we can compose functions by using the . operator:
Trees in Haskell
A binary tree is a data structure in which each node holds a data value and has two children which
are themselves binary trees.
We can implement a tree of integers recursively in Haskell as follows:
Note: If we want Haskell to display this on the command line, we must add deriving Show
below.
We can implement treesort in Haskell using an ordered tree and the following 3 functions:
Input/Output
Some IO functions within Haskell are as follows:
• putStr s – prints s :: String to standard output.
• putStrLn s – same, then starts new line.
• writeFile f s – opens a file f :: String and writes s :: String to it.
• s <- getLine – reads s :: String from standard input.
• s <- readFile f – opens a file f :: String and reads contents to s :: String.
Haskell input/output is done in a do-block:
Lines are executed in order, where each line is either:
• An IO expression whose return value is thrown away (e.g., putStr “hello”)
• An IO expression whose return value is bound to a variable (e.g., x <- readFile
“[Link]”)
• A let-expression that binds an expression to a variable (e.g., let x = fibs !! 10)
Since a do-block is itself an IO expression, it can also call itself e.g.,
To end a loop (neatly, without errors), we can use a conditional if-then-else expression:
IO Types
In Haskell, the type IO a represents:
• A sequence of IO actions.
• A return value of type a.
The IO functions have the following types:
Foldr and Foldl
A function of the form:
Can be written in Haskell as:
We can perform map f followed by foldr g u in the following way:
The rightmost tree is the same as:
Which is the result of foldr (g . f) u, meaning we can collapse a map-then-fold into just a
fold:
This is called a map-fold fusion, and increases the efficiency of the code as it avoids the intermediate
result.
The function foldl is similar to foldr, however is tail-recursive instead (meaning the recursion
happens in the other direction).
To remember it, use the following mnemonic:
a) I put my thing down.
b) Flip it.
c) And reverse it.
For a fold operating on a non-empty lists, use the foldl1 and foldr1 instead.
Lambda (𝜆) Calculus
The 𝝀-calculus (or lambda-calculus) is a theory of functions first proposed by Alonzo Church in the
1930s.
The 𝜆-calculus can be thought of as the mathematical foundation of functional programming.
The 𝜆-calculus is a formal system in which:
• Every term denotes a function.
• Any term (function) can be applied to any other term, so functions are inherently higher-
order.
• Is fully expressive (i.e., Turing-complete).
Syntax of 𝜆-Calculation
𝜆-terms are defined by the following Backus-Naur Form grammar:
𝑀 ∷= 𝑥|𝜆𝑥. 𝑀|𝑀𝑀
where 𝑥 ranges over an infinite collection of variables. Terms are always finite.
The meaning of these 𝜆-terms are as follows:
• 𝜆𝑥. 𝑀 is a function. Roughly, it is the function which, when given argument 𝑥, behaves like
𝑀. When 𝜆𝑥. 𝑀 is applied to some argument, 𝑀 will be evaluated with 𝑥 bound to the
argument. These terms are called 𝝀-abstractions.
• 𝑀𝑁 is the application of function 𝑀 to argument 𝑁.
Abstract Syntax
The BNF grammar described the abstract syntax of the language, which can be thought of as certain
trees. Every term is one of:
• A variable 𝑥
• An abstraction 𝜆𝑥. 𝑀, with two further pieces of information:
o The variable being abstracted – the 𝑥
o The body of the abstraction – the 𝑀
• An application 𝑀𝑁, with two further pieces of information:
o The function part of the application – the 𝑀
o The argument part of the application – the 𝑁
Informally we can draw this using a syntax tree e.g., the term (𝜆𝑥. 𝑥)(𝜆𝑦. 𝜆𝑧. 𝑥) can be drawn:
To prevent confusion, there are some conventions for writing 𝜆-terms:
• Application associates left (e.g., 𝑀𝑁𝑃 means the same as (𝑀𝑁)𝑃)
• The scope of 𝜆 extends as far to the right as possible (e.g., 𝜆𝑥. 𝑀𝑁 means 𝜆𝑥. (𝑀𝑁), not
(𝜆𝑥. 𝑀)𝑁)
• Sometimes 𝜆𝑥𝑦. 𝑀 is written instead of 𝜆𝑥. 𝜆𝑦. 𝑀
Computation Steps
The expression 𝑀 → 𝑀′ means that the term 𝑀 can become the term 𝑀′ in one step of
computation.
Some terms have no computation steps one can perform on them. These are called normal forms
and are the final answer of our computation.
Suppose 𝑀 and 𝑁 are terms where, for all subterms 𝜆𝑧 … in 𝑀:
• 𝑧 is not 𝑥
• 𝑧 does not appear in 𝑁
Define 𝑀[𝑁⁄𝑥 ] to be the term 𝑀, with every occurrence of 𝑥 replaced by 𝑁.
We say that the application (𝜆𝑥. 𝑀)𝑁 reduces to 𝑀[𝑁⁄𝑥 ].
Some example evaluations are as follows:
• (𝜆𝑥. 𝑥)𝑀 reduces to 𝑥[𝑀⁄𝑥 ] which is 𝑀. The term 𝜆𝑥. 𝑥 is called the identity function.
• (𝜆𝑥. 𝜆𝑦. 𝑥)𝑀 reduces to (𝜆𝑦. 𝑥)[𝑀⁄𝑥 ], which is 𝜆𝑦. 𝑀.
• If 𝑀 contains no occurrences of 𝑦, the substitution 𝑀[𝑁⁄𝑦] just gives 𝑀, so (𝜆𝑦. 𝑀)𝑁
reduces to 𝑀.
• (𝜆𝑥. 𝑥𝑥)(𝜆𝑥. 𝑥𝑥) reduces to itself. There is therefore an infinite sequence of reduction steps,
meaning this term has no normal form (which has a special name Ω)
Redexes
When computing in 𝜆-calculus, we are allowed to reduce any redex (reducible expression) within a
term.
A redex in a 𝜆-term is a subterm of the form:
(𝜆𝑥. 𝑀)𝑁
In terms of a syntax tree, it is any tree of the form:
Lambda Calculus in Haskell
We can define lambda calculus in Haskell as follows:
Free and Bound Variables
Take the following term: 𝜆𝑥. 𝑦𝑥
• Renaming 𝑥 is OK (the term has the same meaning). 𝑥 is called a bound variable.
• Renaming 𝑦 is not OK (the terms meaning has changed). 𝑦 is called a free variable.
The set of free variables FV(𝑀) of a term 𝑀 is defined as:
• FV(𝑥) = {𝑥}
• FV(𝜆𝑥. 𝑀) = FV(𝑀) ∖ {𝑥}
• FV(𝑀𝑁) = FV(𝑀) ∪ FV(𝑁)
The set of bound variables BV(𝑀) of a term 𝑀 is defined as:
• BV(𝑀) = ∅
• BV(𝜆𝑥. 𝑀) = BV(𝑀) ∪ {𝑥}
• BV(𝑀𝑁) = BV(𝑀) ∪ BV(𝑁)
A variable can have both free occurrences and bound ones. In a term 𝜆𝑥. 𝑀, the 𝜆 binds all
occurrences of 𝑥 which are free in 𝑀.
The 𝑥 in 𝜆𝑥 is called the binding occurrence, and the occurrences of 𝑥 which are free in 𝑀 are the
bound occurrences of 𝑥 for that 𝜆.
𝛼-conversion
𝜶-conversion is a method of renaming variables. We will define an equivalence relation =𝛼 on
lambda-terms, called alpha-equivalence, which says that two terms differ only in the names of their
bound variables.
The renaming operation is defined as follows:
𝑀[𝑦⁄𝑥 ] is “𝑀 with 𝑥 renamed to 𝑦”
which works as follows:
• 𝑥[𝑦⁄𝑥 ] = 𝑦
• 𝑧[𝑦⁄𝑥 ] = 𝑧 if 𝑧 ≠ 𝑥
• (𝜆𝑥. 𝑀)[𝑦⁄𝑥 ] = 𝜆𝑥. 𝑀
• (𝜆𝑧. 𝑀)[𝑦⁄𝑥 ] = 𝜆𝑧. (𝑀[𝑦⁄𝑥 ]) if 𝑧 ≠ 𝑥
• (𝑀𝑁)[𝑦⁄𝑥 ] = (𝑀[𝑦⁄𝑥 ])(𝑁[𝑦⁄𝑥 ])
If 𝑥 is not free in 𝑀, then 𝑀[𝑁⁄𝑥 ] =𝛼 𝑀
𝛽-reduction
The basic reduction step, called a 𝜷-reduction, takes a term (𝝀𝒙. 𝑴)𝑵 and reduces it to 𝑀[𝑁⁄𝑥 ]
However, we need to ensure avoid variable capture (e.g., naively reducing the expression
(𝜆𝑥. 𝜆𝑦. 𝑥)(𝜆𝑧. 𝑦𝑧) evaluates to 𝜆𝑦. (𝜆𝑧. 𝑦𝑧), which means the variable 𝑦 has gone from being free to
being bound, which is called a variable capture). We can do this through 𝜶-conversion.
Capture-avoiding substitution is defined as follows:
• 𝑥[𝑁⁄𝑥 ] = 𝑁
• 𝑦[𝑁⁄𝑥 ] = 𝑦 if 𝑦 ≠ 𝑥
• (𝜆𝑥. 𝑀)[𝑁⁄𝑥 ] = 𝜆𝑥. 𝑀
• (𝜆𝑦. 𝑀)[𝑁⁄𝑥 ] = 𝜆𝑧. 𝑀[𝑧⁄𝑦][𝑁⁄𝑥 ] where 𝑧 is fresh (not used in 𝑀 or 𝑁, and 𝑧 ≠ 𝑥, 𝑦)
• (𝑀1 𝑀2 )[𝑁⁄𝑥 ] = (𝑀1 [𝑁⁄𝑥 ])(𝑀2 [𝑁⁄𝑥 ])
A term in the form (𝜆𝑥. 𝑀)𝑁 is called a 𝜷-redex (reducible expression), and it reduces to 𝑀[𝑁⁄𝑥 ] as
we expect.
We write:
𝑀 →𝛽 𝑁
if 𝑁 results from 𝛽-reducing any subterm of 𝑀, and:
𝑀 →𝛽∗ 𝑁
if 𝑁 results from a (possibly zero-length) sequence of →𝛽 steps starting from 𝑀
Church-Turing Thesis
Church’s Thesis: a function of positive integers is computable if and only if lambda-definable.
Turing’s Thesis: something is computable if and only if a Turing Machine can compute it.
Turing then showed that something is lambda-definable if and only if a Turing Machine can compute
it.
Church Booleans
To encode Booleans, Church came up with a clever idea: start with a term-shape that can be offered
two options, and then write down two distinct terms:
true ≜ 𝜆𝑥. 𝜆𝑦. 𝑥
false ≜ 𝜆𝑥. 𝜆𝑦. 𝑦
To use Booleans, we need a conditional operator (like if-then-else). The Church Booleans do the
choosing for us:
true 𝑀 𝑁 →∗β 𝑀
false 𝑀 𝑁 →∗β 𝑁
We can therefore define a term ifthen as follows:
ifthen ≜ 𝜆𝑏. 𝜆𝑥. 𝜆𝑦. 𝑏𝑥𝑦
Now we have ifthen, we can define operations on Booleans as follows:
Conjunction: 𝜆𝑏1 . 𝜆𝑏2 . ifthen 𝑏1 𝑏2 false
Disjunction: 𝜆𝑏1 . 𝜆𝑏2 . ifthen 𝑏1 true 𝑏2
Numbers
There are lots of ways to encode natural numbers in lambda calculus, however the standard one is
Church’s encoding, which represents the value 𝑛 by:
There are called the Church numerals.
Adding
We can define successor and addition operations as follows:
succ ≜ 𝜆𝑛. 𝜆𝑓. 𝜆𝑥. 𝑓(𝑛 𝑓 𝑥)
add ≜ 𝜆𝑚. 𝜆𝑛. 𝜆𝑓. 𝜆𝑥. 𝑚 𝑓(𝑛 𝑓 𝑥)
With these definitions, the expected computations can be made:
succ 𝑛 →𝛽∗ (𝑛 + 1)
add 𝑚 𝑛 →𝛽∗ (𝑚 + 𝑛)
Multiplication and Exponentiation
mul ≜ 𝜆𝑚. 𝜆𝑛. 𝜆𝑓. 𝜆𝑥. 𝑚 (𝑛 𝑓) 𝑥
exp ≜ 𝜆𝑚. 𝜆𝑛. 𝜆𝑓. 𝜆𝑥. 𝑛 𝑚 𝑓 𝑥
With these definitions, we get:
mul 𝑚 𝑛 →𝛽∗ (𝑚 × 𝑛)
exp 𝑚 𝑛 →𝛽∗ 𝑚𝑛
Test For Zero
A common operation is to check whether a number is zero or not. With Church encoding, this is
done as follows:
• Applying 𝜆𝑥.false to any argument any non-zero number of times gives false.
• Applying it zero times to true gives true.
Therefore, we can test for zero with the following term:
𝜆𝑛. 𝑛 (𝜆𝑥.false) true
Predecessor
The last operation we need on the Church numerals is predecessor (i.e., subtracting one). The
encoding for this is as follows:
pred ≜ 𝜆𝑛. 𝜆𝑓. 𝜆𝑥. 𝑛(𝜆𝑔. 𝜆ℎ. ℎ(𝑔𝑓)) (𝜆𝑢. 𝑥) (𝜆𝑢. 𝑢)
This definition works as follows:
pred (𝑛 + 1) →∗𝛽 𝑛
pred 0 →𝛽∗ 0
Fixed Points
Consider a function 𝑓 on the integers. A fixed point of 𝑓 is an integer 𝑥 such that:
𝑓(𝑥) = 𝑥
In 𝜆-calculus, a fixed point for a 𝜆-term 𝐹 is a 𝜆-term 𝑀 such that:
𝑀 =𝛽 𝐹 𝑀
(beta-equivalence [=𝛽 ] means that two terms only differ in beta-reduction steps)
Curry’s Fixed Point Combinator
Curry’s Y-combinator is a very clever 𝜆-term that exploits self-application to deliver fixed points. It is
defined as follows:
𝑌 ≜ 𝜆𝑓. (𝜆𝑥. 𝑓(𝑥𝑥))(𝜆𝑥. 𝑓(𝑥𝑥))
For any term 𝐹, we have:
𝑌𝐹 →𝛽 (𝜆𝑥. 𝐹(𝑥𝑥))(𝜆𝑥. 𝐹(𝑥𝑥)) →𝛽 𝐹 ((𝜆𝑥. 𝐹(𝑥𝑥))(𝜆𝑥. 𝐹(𝑥𝑥)))
But also:
𝐹(𝑌𝐹) →𝛽 𝐹 ((𝜆𝑥. 𝐹(𝑥𝑥))(𝜆𝑥. 𝐹(𝑥𝑥)))
Every term in the 𝜆-calculus has a fixed point. This is because for any term, 𝐹(𝑌𝐹) =𝛽 𝑌𝐹, therefore
𝑌𝐹 is a fixed point of 𝐹.
In general, the Y combinator gives you the least fixed point of a term (i.e., the least defined, which
will loop infinitely whenever it can get away with it).
Recursion in Lambda Calculus
Let’s take an example of a recursive program in 𝜆-calculus, in this case the factorial program:
fact 𝑛 = ifthen (iszero 𝑛) 1 (mul 𝑛 (fact(pred 𝑛)))
This can also be written as follows:
fact = (𝜆𝐹. 𝜆𝑛. ifthen (iszero 𝑛) 1 (mul 𝑛 (F(pred 𝑛)))) fact
A function that satisfies the required equation will be a fixed point of the function, so we can use the
Y combinator to find this:
fact ≜ 𝑌 (𝜆𝐹. 𝜆𝑛. ifthen (iszero 𝑛) 1 (mul 𝑛 (F(pred 𝑛))))
There is a general recipe for recursion in 𝜆-calculus:
1) Write down the recursive definition as an equation:
myfunction = 𝜆𝑥. 𝑀
where 𝑀 is a term that “recursively” calls myfunction.
2) 𝜆-abstract the function you are trying to define:
myfunction = (𝜆𝐹. 𝜆𝑥. 𝑀′ ) myfunction
where 𝑀′ is 𝑀[𝐹 ⁄myfunction].
3) Define myfunction as a fixed point:
myfunction ≜ Y(𝜆𝐹. 𝜆𝑥. 𝑀′ )
Simple Types
Nowadays, types are common in most programming languages.
Types in 𝜆-calculus are similar to those in Haskell: because every 𝜆-term is a function, its type is
going to tell us what kind of input and output it will return.
In its simplest form, the grammar of types is given by:
𝜏 ∷= 𝑜 | 𝜏 → 𝜏
Where 𝑜 is a fixed type called the base type (in Haskell e.g., Int, Bool, Char, etc.)
In simply-typed lambda-calculus, we label abstracted variables with their type:
𝑀 ∷= 𝑥 | 𝜆𝑥 𝜏 . 𝑀 | 𝑀𝑀
Some rules for typing are:
• A term, with a given context, has at most one type.
• Substitution preserves typing.
• 𝛽-reduction preserves typing (called the subject reduction property)
Typing Judgement
We write 𝑴: 𝝉 to say that term 𝑀 has type 𝜏.
A typing context Γ is a finite function from variables to types, which we write as a list:
Γ = 𝑥1 : 𝜏1 , 𝑥2 : 𝜏2 , … , 𝑥𝑛 : 𝜏𝑛
A typing judgement Γ ⊢ 𝑀: 𝜏 says that 𝑀 has type 𝜏 in context Γ. If every variable 𝑥𝑖 has type 𝜏𝑖
then 𝑀 has type 𝜏. A typing judgement is a proposition and can be true or false.
The true typing judgements are generated inductively by the rules (these should be read from the
bottom up):
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
Γ, 𝑥: 𝜏 ⊢ 𝑥: 𝜏
Γ, 𝑥: 𝜏 ⊢ 𝑀: 𝜎
Γ ⊢ 𝜆𝑥 𝑇 . 𝑀: 𝜏 → 𝜎
Γ ⊢ 𝑀: 𝜏 → 𝜎 Γ ⊢ 𝑁: 𝜏
Γ ⊢ 𝑀𝑁: 𝜎
A term 𝑀 is well-typed if there is a true judgement Γ ⊢ 𝑀: 𝜏. If 𝑀 is well-typed, so are its sub terms.
An example typing derivation is as follows:
Typed Church Numerals
Remember the Church Numerals:
These can be given types: if 𝑥 has some type 𝜏, then 𝑓 must have type 𝜏 → 𝜏. We then have:
⊢ 𝜆𝑓 𝜏→𝜏 . 𝜆𝑥 𝜏 . 𝑓 (𝑓(… (𝑓𝑥))) ∶ (𝜏 → 𝜏) → 𝜏 → 𝜏
Typed Booleans
Remember the encodings of Booleans:
true ≜ 𝜆𝑥. 𝜆𝑦. 𝑥
false ≜ 𝜆𝑥. 𝜆𝑦. 𝑦
ifthen ≜ 𝜆𝑏. 𝜆𝑥. 𝜆𝑦. 𝑏 𝑥 𝑦
Assuming both true and false have the same type, we can write ifthen as:
⊢ 𝜆𝑏 𝜏→𝜏→𝜏 . 𝜆𝑥 𝜏 . 𝜆𝑦 𝜏 . 𝑏𝑥𝑦 ∶ (𝜏 → 𝜏 → 𝜏) → 𝜏 → 𝜏 → 𝜏
Equivalence
An equivalence (relation), or equational theory, is a relation that is reflexive, symmetric, and
transitive. We can think of these being “the same, modulo something” e.g.,
• 𝜶-equivalence 𝑀 =𝛼 𝑁: terms equivalent “modulo” variable names.
• 𝜷-equivalence 𝑀 =𝛽 𝑁: terms equivalent “modulo” computation.
Given a relation →𝑅 , we can build its:
• Transitive closure →+
𝑅
𝑥 →+𝑅 𝑦 ≜ 𝑥 →𝑅 … →𝑅 𝑦 (in 𝑛 ≥ 1 steps)
• Reflexive-Transitive closure →∗𝑅
𝑥 →∗𝑅 𝑦 ≜ 𝑥 →𝑅 … →𝑅 𝑦 (in 𝑛 ≥ 0 steps)
• Reflexive-Transitive-Symmetric closure =𝑅 (in this case, =𝑅 is an equivalence relation)
𝑥 =𝑅 𝑦 ≜ 𝑥 →∗𝑅 𝑅∗← … →∗𝑅 𝑅∗← 𝑦
Therefore, this means that:
• 𝜶-equivalence is the reflexive-transitive-symmetric closure of →𝜶
• 𝜷-equivalence is the reflexive-transitive-symmetric closure of →𝜷
Normal Form
A term is a normal form if it does not contain a redex. A term 𝑁 is a normal form of 𝑴 if 𝑁 is a
normal form and 𝑀 →∗β 𝑁.
Some terms, such as Ω ≡ 𝜔𝜔 ≡ (𝜆𝑥. 𝑥𝑥)(𝜆𝑥. 𝑥𝑥) which reduces to itself, have no normal form.
Church-Rosser Theorem
The Church-Rosser theorem states:
𝑴 =𝛃 𝑵 if and only if there is a term 𝑷 such that 𝑴 →∗𝛃 𝑷 and 𝑵 →∗𝛃 𝑷
This tells us that:
• If 𝑀 =β 𝑁 and 𝑁 is normal form, then 𝑀 →∗β 𝑁.
• If 𝑀 =β 𝑁 and both 𝑀 and 𝑁 are normal forms, then 𝑀 ≡ 𝑁 (they are the same term).
Confluence
Confluence, a property of Lambda terms, says that whenever 𝑀 →∗β 𝑁1 and 𝑀 →∗β 𝑁2 , there is some
𝑃 such that 𝑁1 →∗β 𝑃 and 𝑁2 →∗β 𝑃.
Weak (or Local) Confluence says that whenever 𝑀 →𝛽 𝑁1 and 𝑀 →𝛽 𝑁2 , there is some 𝑃 such that
𝑁1 →∗β 𝑃 and 𝑁2 →∗β 𝑃.
Proving local confluence does not prove confluence!
The Church-Rosser theorem clearly implies confluence (in fact, confluence is equivalent to Church-
Rosser)
Type Variables
In typed lambda-calculus, the following two functions are distinct:
𝜆𝑥 Int . 𝑥:Int → Int and 𝜆𝑥 Bool . 𝑥:Bool → Bool
This means that for every function that can type multiple types, we need different function. To solve
this, we can use type variables, or polymorphism e.g., In Haskell:
A basic type system with type variables is the following:
𝜌, 𝜎, 𝜏 ∷= 𝛼 ∈ 𝑉 | 𝑡 ∈ 𝑇 | 𝜏 → 𝜏
where:
• 𝑇 is a set of base variables (𝑇 = ∅ is possible).
• 𝑉 is a set of type variables {𝛼, 𝛽, 𝛾, … }.
This is similar to simple types, however, also contains variables as well as constants.
In type 𝜏, we can substitute a type 𝜎 for a type variable 𝛼:
𝜏[𝜎⁄𝛼 ]
Type Inference
Given a lambda-term, we can give it a type. This is called type inference. An informal algorithm for
type inference is as follows:
• Build the typing derivation without filling in the types.
• Fill in the types as follows:
o Give your term a variable as type, ⊢ 𝑡: 𝛼.
o If you have a type variable 𝛼, but you need an arrow type, replace 𝛼 with 𝛽 → 𝛾
(where 𝛽 and 𝛾 are new variables).
o If you have a type variable 𝛼, but you need a type 𝜏 that doesn’t contain 𝜶, replace
𝛼 with 𝜏.
o If you have a type 𝜎1 → 𝜎2 but you need 𝜏1 → 𝜏2 , then you have 𝜎𝑖 but need 𝜏𝑖 for
𝑖 = 1,2.
More formally, this is called the Hidley-Milner algorithm, and is described as follows:
• Input: a context Γ and a lambda-term 𝑁
• Output: a type 𝜏 and a series of substitutions 𝑆
Γ ⊢ 𝑁 ⟹ (𝜏, 𝑆)
1) Γ, 𝑥: 𝜏 ⊢ 𝑥 ⟹ (𝜏, 𝑆)
2) Γ ⊢ 𝜆𝑥. 𝑀 ⟹ (𝜎 → 𝜏, 𝑆) where:
a. 𝛼 is a new type variable.
b. Γ, 𝑥: 𝛼 ⊢ 𝑀 ⟹ (𝜏, 𝑆)
c. 𝜎 = 𝛼[𝑆]
3) Γ ⊢ 𝑀𝑁 ⟹ (𝛼[𝑆], 𝑆𝑇𝑈) where:
a. Γ ⊢ 𝑀 ⟹ (𝜎, 𝑆)
b. Γ[𝑆] ⊢ 𝑁 ⟹ (𝜏, 𝑇)
c. 𝛼 is a new type variable.
d. 𝑈 is the MGU (detailed below) of 𝜎[𝑆] and (𝜏 → 𝛼) (FAIL if no such 𝑈 exists).
Unifiers
For types 𝜏 and 𝜎,
𝜏 ≤ 𝜎 if 𝜎 = 𝜏[𝜌1 ⁄𝛼1 ] … [𝜌𝑛 ⁄𝛼𝑛 ] for certain 𝜌𝑖 and 𝛼𝑖 (for 𝑖 ≤ 𝑛)
We say that 𝜏 is more general than 𝜎, and 𝜎 is a specialisation of 𝜏.
We have a pre-order (types, ≤) on types:
• (≤) is reflexive and transitive.
• If 𝜎 ≤ 𝜏 and 𝜏 ≤ 𝜎 then 𝜎 is a renaming of 𝜏.
If 𝑀 is typed by 𝜏, is can also be typed by any specialisation of 𝜏 (if Γ ⊢ 𝑀: 𝜏, then also Γ[𝜎⁄𝛼 ] ⊢
𝑀: 𝜏[𝜎⁄𝛼 ]).
A typeable term has a most general type (unique up to renaming) (if 𝑀: 𝜏 then then there is an 𝑀: 𝜎
such that 𝜎 ≤ 𝜌 for any 𝑀: 𝜌)
The most general unifier (MGU) of two types 𝜎 and 𝜏 (if it exists) is a series of substitutions 𝑆 such
that:
𝜎[𝑆] = 𝜏[𝑆]
For a series of substitutions 𝑆 = [𝜎1 ⁄𝛼1 ] … [𝜎𝑛 ⁄𝛼𝑛 ] we will use the notation:
• 𝜏[𝑆] = 𝜏[𝜎1 ⁄𝛼1 ] … [𝜎𝑛 ⁄𝛼𝑛 ] to apply 𝑆 to 𝜏.
• 𝜖 for the empty sequence, so that 𝜏[𝜖] = 𝜏 (this is the identity substitution).
• 𝑆𝑇 for appending 𝑆 to 𝑇, so that 𝜏[𝑆𝑇] = 𝜏[𝑆][𝑇] (this is the composition of 𝑆 and 𝑇).
Unification is an algorithm to find the MGU of two types, operating on:
• A solution in progress 𝑆, initially 𝜖.
• A series of equations 𝐸 = {𝜎1 = 𝜏1 , … , 𝜎𝑛 = 𝜏𝑛 }, initially {𝜎 = 𝜏}.
The algorithm proceeds as follows:
1) On input (𝑆, ∅), return 𝑆.
2) On input (𝑆, {𝑒} ∪ 𝐸), if the equation 𝑒 is of the form:
a. 𝛼 = 𝛼: continue with (𝑆, 𝐸)
b. 𝛼 = 𝜏 or 𝜏 = 𝛼: if 𝛼 occurs in 𝜏, FAIL; otherwise, continue with (𝑆[𝜏⁄𝛼 ], 𝐸[𝜏⁄𝛼 ]).
c. 𝜎1 → 𝜎2 = 𝜏1 → 𝜏2 : continue with (𝑆, 𝐸 ∪ {𝜎1 = 𝜏1 , 𝜎2 = 𝜏2 })
Evaluation Strategies
Thanks to the Church-Rosser theorem, we know that if a term 𝑀 has a normal form, we can always
reach it regardless of the path, however some paths contain wasted duplication.
Finding unneeded code however is undecidable, however we can mitigate this impact by using an
evaluation strategy and sticking with it, regardless of consequences. In 𝜆-calculus, there are two
important evaluation strategies:
• Normal-order reduction (or leftmost outermost reduction/standard reduction/call-by-
name): This method is less efficient (can produce duplicates), but will always produce an
answer if there is one (If a term 𝑀 has a normal form 𝑁, then reducing 𝑀 with the normal-
order strategy will eventually reach 𝑁). It involves reducing the redex that is leftmost
outermost.
• Applicative-order reduction (or leftmost innermost reduction/call-by-value): This method is
efficient (doesn’t duplicate work), however may not find a normal form, even if one exists. (t
involves reducing the redex that is leftmost innermost.
Haskell uses a different method called lazy evaluation (or call-by-need), which is efficient (will only
evaluate when needed), is correct (will always find a normal form if it exists), however
implementation is far more complicated and creates overhead. It is a version of call-by-name that
avoids duplication by using graph reduction.
Graph Reduction
In graph reduction, we replace terms we have been working on with graphs e.g.,
(𝜆𝑥. 𝑥)(𝜆𝑥. 𝑥)
Another example:
3𝑁 = (𝜆𝑓. 𝜆𝑥. 𝑓 (𝑓(𝑓(𝑥)))) 𝑁 and 𝑁 →𝛽 𝑁′
Normally:
With graph reduction:
Normalisation
A term 𝑀 is called:
• Weakly normalising (WN) if it has a normal form.
• Strongly normalising (SN) if there is no infinite sequence of reductions
𝑀 →𝛽 𝑀1 →𝛽 𝑀2 →𝛽 … (whatever sequence of reductions is chosen, the normal form will
always be reached).
In typed-lambda calculus, self-application is no longer possible (due to type constraints). Therefore:
• Weak normalisation theorem: If Γ ⊢ 𝑀: 𝜏 is a well-typed term, then 𝑀 has a normal form.
• Strong normalisation theorem: If Γ ⊢ 𝑀: 𝜏 is a well-typed term, then 𝑀 is strongly
normalising.