Algoxy en
Algoxy en
1
Xinyu LIU
1
Xinyu LIU
Version: 0.6180339887498949
Email: liuxinyu95@[Link]
2
Contents
1 List 21
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2.1 Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3 Basic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Last . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3.3 Reverse index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.4 Mutate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Append . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Set value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.5 sum and product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Recursive sum and product . . . . . . . . . . . . . . . . . . . . . . 30
Tail call recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.6 maximum and minimum . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4 Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.4.1 map and for-each . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
For each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.4.2 reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.5 Sub-list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.5.1 take, drop, and split-at . . . . . . . . . . . . . . . . . . . . . . . . 40
conditional take and drop . . . . . . . . . . . . . . . . . . . . . . . 41
1.5.2 break and group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
break and span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6 Fold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3
4 CONTENTS
3 Insertion sort 69
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4 List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Binary search tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4 Red-black tree 75
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.1.2 Tree rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.5 Imperative red-black tree algorithm ⋆ . . . . . . . . . . . . . . . . . . . . 86
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.7 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 AVL tree 91
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4 Imperative AVL tree algorithm ⋆ . . . . . . . . . . . . . . . . . . . . . . 96
CONTENTS 5
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 98
7 B-Trees 127
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2.1 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Split before insertion . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Insert then fixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.3 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.1 Merge before delete method . . . . . . . . . . . . . . . . . . . . . . 135
7.3.2 Delete and fix method . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.4 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9 From grape to the world cup, the evolution of selection sort 181
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2 Finding the minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2.1 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2.2 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.2.3 performance of the basic selection sorting . . . . . . . . . . . . . . 186
9.3 Minor Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.3.1 Parameterize the comparator . . . . . . . . . . . . . . . . . . . . . 186
9.3.2 Trivial fine tune . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.3.3 Cock-tail sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.4 Major improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.4.1 Tournament knock out . . . . . . . . . . . . . . . . . . . . . . . . . 192
Refine the tournament knock out . . . . . . . . . . . . . . . . . . . 196
9.4.2 Final improvement by using heap sort . . . . . . . . . . . . . . . . 199
9.5 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
14 Searching 367
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
14.2 Sequence search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
14.2.1 Divide and conquer search . . . . . . . . . . . . . . . . . . . . . . . 367
k-selection problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
2 dimensions search . . . . . . . . . . . . . . . . . . . . . . . . . . 374
Brute-force 2D search . . . . . . . . . . . . . . . . . . . . . 375
CONTENTS 9
Appendices
8. TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
9. TERMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
10. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . . . . . . . . 502
11. RELICENSING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
ADDENDUM: How to use this License for your documents . . . . . . . . . . . 503
Elementary Algorithms 11
How can we find the smallest free number, which is 10, from the list? It seems quite
easy to figure out the solution.
1: function Min-Free(A)
2: x←0
3: loop
4: if x ∈
/ A then
5: return x
6: else
7: x←x+1
Where the ∈/ is realized like below.
1: function ‘∈’(x,
/ X)
2: for i ← 1 to |X| do
3: if x = X[i] then
4: return False
5: return True
Some environments have built-in implementation to test if an element is in a list.
Below is an example program.
def minfree(lst):
i = 0
while True:
if i not in lst:
return i
i = i + 1
However, when there are millions of numbers being used, this solution performs poor.
The time spent is quadratic to the length of the list. In a computer with 2 cores of
2.10 GHz CPU, and 2G RAM, the C implementation takes 5.4s to search the minimum
free number among 100,000 numbers, and takes more than 8 minutes to handle a million
numbers.
12 Preface
0.1.1 Improvement
The key idea to improve the solution is based on the fact that, for n numbers x1 , x2 , ..., xn ,
if there exists free number, some xi must be outside the range [0, n); otherwise the list is
exactly some permutation of 0, 1, ..., n − 1 hence n should be returned as the minimum
free number. In summary:
}
}
}
This program can handle 1 million numbers in 0.023s in the same computer.
search(∅, l, u) = l{
|A′ | = m − l + 1 : search(A′′ , m + 1, u)
search(A, l, u) =
otherwise : search(A′ , l, m)
where
l+u
m = b c
2
A′ = [x|x ∈ A, x ≤ m]
A′′ = [x|x ∈ A, x > m]
This algorithm doesn’t need additional space1 . Each recursive call performs O(|A|)
comparisons to build A′ and A′′ . After that the problem scale halves. Therefore, the time
is bound to T (n) = T (n/2) + O(n), which reduce to O(n) according to master theorem.
Alternatively, observe that the first call takes O(n) to build A′ and A′′ and the second
call takes O(n/2), and O(n/4) for the third... The total time is O(n + n/2 + n/4 + ...) =
O(2n) = O(n). We use [a|a ∈ A, p(a)] for list. It is different with {a|a ∈ A, p(a)}, which
is a set.
Below example Haskell program implements this algorithm.
minFree xs = bsearch xs 0 (length xs - 1)
bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m+1) u
| otherwise = bsearch as l m
where
m = (l + u) `div` 2
(as, bs) = partition ( ≤ m) xs
1 The recursion takes O(lg n) stack spaces, but it can be eliminated through tail recursion optimization
14 Preface
left right
Figure 1: Divide the array, all A[i] ≤ m where 0 ≤ i < lef t; while all A[i] > m where
lef t ≤ i < right. The rest elements haven’t been processed yet.
This solution is fast and needn’t extra stack space. However, compare to the previous
recursive one, there is some expressiveness drops. Depends on individual taste, one may
prefer one over the other.
2: x←1
3: while n > 0 do
4: x←x+1
5: if Valid?(x) then
6: n←n−1
7: return x
8: function Valid?(x)
9: while x mod 2 = 0 do
10: x ← bx/2c
11: while x mod 3 = 0 do
12: x ← bx/3c
13: while x mod 5 = 0 do
14: x ← bx/5c
15: return x = 1 ?
This ‘brute-force’ algorithm works for small n. However, to find the 1500-th regular
number (which is 860934420), its C implementation takes 40.39s in above computer.
When n increases to 15,000, it can’t terminate after 10 minutes.
0.2.2 Improvement
Modular and divide calculations are very expensive [2]. And they are executed a lot in
loops. Instead of checking if a number only contains 2, 3, or 5 as factors, we can construct
regular number from these three factors. We can start from 1, multiply it with 2, 3, or 5
to generate the rest numbers. The problem turns to be how to generate regular numbers
in order? One method is to utilize the queue data structure.
A queue allows to add element to one end (called enqueue), and delete from the other
(called dequeue). The element enqueued first will be dequeued first. This nature is called
FIFO (First In First Out). The idea is to add 1 as the first number to the queue. We
repeatedly dequeue a number, multiply it with 2, 3, and 5, to generate 3 new numbers;
then add them back to the queue in order. A new generated number may already exist
in the queue. In such case, we drop the duplicated number. Because the new number
may be smaller than the others in the queue, we must put them at the correct position.
Figure 2 shows this idea.
1 2 3 5 3 4 5 6 10
(a) Initialize with 1 (b) Add 2, 3, 5 back; (c) Add 4, 6, and 10 back;
4 5 6 9 10 15
Where symbol x : X means to link x before list X, such that x becomes the first
element. It is called ‘cons’ in Lisp. We link 1 before the rest, as it is the first regular
number. To implement infinite lists merge, we define ∪ to recursively compare elements
in two sorted lists. Let X = [x1 , x2 , x3 ...], Y = [y1 , y2 , y3 , ...] be two such lists, X ′ =
[x2 , x3 , ...] and Y ′ = [y2 , y3 , ...] contain the rest elements except without their heads x1
and y1 . We define merge as below:
′
x 1 < y 1 : x 1 : X ∪ Y
X ∪ Y = x1 = y1 : x1 : X ′ ∪ Y ′
y1 < x1 : y1 : X ∪ Y ′
We need not concern about either X or Y is empty, because they are both infinite lists.
In functional settings that support lazy evaluation, this algorithm can be implemented as
the following example program:
ns = 1 : (map (∗2) ns) `merge` (map (∗3) ns) `merge` (map (∗5) ns)
0.2.3 Queues
Although the improved solution is much faster than the original brute-force one, it gener-
ates duplicated numbers, and they are eventually dropped. In order to keep numbers
ordered, it needs linear time scan and insertion, which degrades the enqueue opera-
tion from constant time to O(|Q|). To avoid duplication, we can separate all regular
numbers into 3 disjoint buckets: Q2 = {2i |i > 0}, Q23 = {2i 3j |i ≥ 0, j > 0}, and
Q235 = {2i 3j 5k |i, j ≥ 0, k > 0}. The constraints that j 6= 0 in Q23 , and k 6= 0 in
Q235 ensure there is no overlap. The bucket is realized as a queue. They are initialized as
Q2 = {2}, Q23 = {3}, and Q235 = {5}. Starting from 1, each time we extract the smallest
number x from the three queues as the next regular number. Then do the following:
• If x comes from Q2 , we enqueue 2x, 3x, and 5x back to Q2 , Q23 , and Q235 respec-
tively;
• If x comes from Q23 , we only enqueue 3x to Q23 , and 5x to Q235 . We should not
add 2x to Q2 , because Q2 cannot hold any numbers divided by 3.
• If x comes from Q235 , we only need enqueue 5x to Q235 . We should not add 2x to
Q2 , or 3x to Q23 because they can’t hold numbers divided by 5.
We reach to the answer after repeatedly enqueue the smallest number n times. The
following algorithm implements this idea:
1: function Regular-Number(n)
2: x←1
3: Q2 ← {2}, Q23 ← {3}, Q235 ← {5}
4: while n > 0 do
5: x ← min(Head(Q2 ), Head(Q23 ), Head(Q235 ))
6: if x = Head(Q2 ) then
7: Dequeue(Q2 )
8: Enqueue(Q2 , 2x)
18 Preface
2 3 5 4 3 6 5 10
min=2 min=3
4 6 9 5 10 15 8 6 9 12 5 10 15 20
min=4 min=5
Figure 4: First 4 steps with Q2 , Q23 , and Q235 . They were initialize with 2, 3, 5.
9: Enqueue(Q23 , 3x)
10: Enqueue(Q235 , 5x)
11: else if x = Head(Q23 ) then
12: Dequeue(Q23 )
13: Enqueue(Q23 , 3x)
14: Enqueue(Q235 , 5x)
15: else
16: Dequeue(Q235 )
17: Enqueue(Q235 , 5x)
18: n←n−1
19: return x
This algorithm loops on n. Each time it extracts the minimum number from the head
of three queues. This takes constant time. Then it add at most 3 numbers to each queue
respectively. This takes constant time too. Therefore the algorithm is bound to O(n).
0.3 Summary
One might think the brute-force solution was sufficient to solve both programming puzzles.
However, as the problem scales up, we have to seek for better solutions. There are
many interesting problems, which were hard before, but through computer programming,
we are able to solve them nowadays. This book aims to provide both functional and
imperative definition for the commonly used elementary algorithms and data structures.
We referenced many results from Okasaki’s work[3] and classic text books(for example
[4]). We try to avoid relying on a specific programming language, because it may or may
not be familiar with the reader, and programming languages keep changing. Instead, we
use pseudo code or mathematics notation to make the algorithm definition generic. When
give code examples, the functional ones look more like Haskell, and the imperative ones
look like a mix of C, Java, and Python. They are only for illustration purpose, but not
guaranteed following any language specification strictly.
Elementary Algorithms 19
Exercise 1
1. For the free number puzzle, since all numbers are not negative, we can leverage the
sign as a flag to indicate a number exists. We can scan the number list, for every
number |x| < n (where n is the length), negate the number at position |x|. Then
we run another round of scan to find out the first positive number. It’s position is
the answer. Write a program to realize this method.
2. There are n numbers 1, 2, ..., n. After some processing, they are shuffled, and a
number x is altered to y. Suppose 1 ≤ y ≤ n, design a solution to find x and y in
linear time with constant space.
3. Below example program is a solution for the regular number puzzle. Is it equivalent
to the queue based solution?
Int regularNum(Int m) {
nums = Int[m + 1]
n = 0, i = 0, j = 0, k = 0
nums[0] = 1
x2 = 2 ∗ nums[i]
x3 = 3 ∗ nums[j]
x5 = 5 ∗ nums[k]
while (n < m) {
n = n + 1
nums[n] = min(x2, x3, x5)
if (x2 == nums[n]) {
i = i + 1
x2 = 2 ∗ nums[i]
}
if (x3 == nums[n]) {
j = j + 1
x3 = 3 ∗ nums[j]
}
if (x5 == nums[n]) {
k = k + 1
x5 = 5 ∗ nums[k]
}
}
return nums[m];
}
20 List
Chapter 1
List
1.1 Introduction
List and array are the preliminary build blocks to create complex data structure. Both
can hold multiple elements as a container. Array is trivially implemented as a range of
consecutive cells indexed by a number. The number is called address or position. Array
is typically bounded. Its size need be determined before using. While list increases on-
demand to hold additional elements. One can traverse a list one by one from head to
tail. Particularly in functional settings, the list related algorithms play critical roles to
control the computation and logic structure1 . Readers already familiar with map, filter,
fold algorithms are safe to skip this chapter, and directly start from chapter 2.
1.2 Definition
List, also known as singly linked-list is a data structure recursively defined as below:
Figure 1.1 shows a list of nodes. Each node contains two part, an element called key,
and a reference to the sub-list called next. The sub-list reference in the last node is empty,
marked as ‘NIL’.
NIL
Every node links to the next one or NIL. Linked-list is often defined through compound
structure2 , for example:
struct List<A> {
A key
List<A> next
}
1 In low level, lambda calculus plays the most critical role as one of the computation model equivalent
21
22 CHAPTER 1. LIST
It needs more clarification for the empty list. Many traditional environments support
null concept. There are two different ways to represent empty list. One is to use null (or
NIL) directly; the other is to construct a list, but put nothing as []. From implementation
perspective, null need not allocate any memory, while [] does. In this book, we use ∅ to
represent generic empty list, set, or container.
1.2.1 Access
Given a none empty list L, we need define two functions to access its first element, and
the rest sub-list. They are often called f irst(L), rest(L) or head(L), tail(L)3 . On the
other hand, we can construct a list from an element x and another list xs (can be empty),
denoted as x : xs. It is also called the cons operation. We have the following equations
hold:
{
head(x : xs) = x
(1.1)
tail(x : xs) = xs
For a none empty list X, we will also use x1 for the first element, and use X ′ for the
rest sub-list. For example, when X = [x1 , x2 , x3 , ...], then X ′ = [x2 , x3 , ...].
Exercise 1.2
1. For list of type A, suppose we can test if any two elements x, y ∈ A are equal,
define an algorithm to test if two lists are identical.
1.3.1 index
Different from array, which supports random access an element at position i in constant
time, we need traverse the list i steps to access the target element.
{
i=0: x
getAt(i, x : xs) = (1.3)
i 6= 0 : getAt(i − 1, xs)
3 They are named as car and cdr in Lisp due to the design of machine registers[63].
1.3. BASIC OPERATIONS 23
We intend to leave the empty list not handled. The behavior when pass ∅ is undefined.
As such, the out of bound case also leads to undefined behavior. If i > |L| exceeds the
length, we end up the edge case to access the (i − |L|)-th element of the empty list. On
the other hand, if i < 0, minus it by one makes it even farther away from 0. We finally
end up with the same situation that the index is negative, while the list is empty.
This algorithm is bound to O(i) time as it advances the list i steps. Below is the
corresponding imperative implementation:
1: function Get-At(i, L)
2: while i 6= 0 do
3: L ← Next(L) ▷ Raise error when L = NIL
4: i←i−1
5: return First(L)
Exercise 1.3
1. In the iterative Get-At(i, L) algorithm, what is the behavior when L is empty?
what is the behavior when i is out of the bound or negative?
1.3.2 Last
There is a pair of symmetric operations to ‘first/rest’. They are called ‘last/init’. For a
none empty list X = [x1 , x2 , ..., xn ], function last returns the last element xn , while init
returns the sub-list of [x1 , x2 , ..., xn−1 ]. Although they are symmetric pairs left to right,
‘last/init’ need linear time, because we need traverse the whole list to tail.
When access the last element of list X:
• If the X contains only one element as [x1 ], then x1 is the last one;
last([x]) = x
(1.4)
last(x : xs) = last(xs)
Similarly, when extract the sub-list of X contains all elements without the last one:
We leave the empty list not handled for both operations. The behavior is undefined
if pass ∅ in. Below are the iterative implementation:
1: function Last(L)
2: x ← NIL
3: while L 6= NIL do
4: x ← First(L)
24 CHAPTER 1. LIST
5: L ← Rest(L)
6: return x
7: function Init(L)
8: L′ ← NIL
9: while Rest(L) 6= NIL do ▷ Raise error when L is NIL
10: L′ ← Cons(First(L), L′ )
11: L ← Rest(L)
12: return Reverse(L′ )
As advancing towards the tail, this algorithm accumulates the ‘init’ result through
‘cons’. However, such result is in the reversed order. We need apply reverse (defined in
section 1.4.2) again to return the correct result. There is a question to ask if we can use
‘append’ instead of ‘cons’ in the exercise.
There actually exists better solution. The idea is to keep two pointers p1 , p2 with
the distance i between them. The equation resti (p2 ) = p1 holds, where resti (p2 ) means
repleatedly apply rest() function i times. When succeed p2 by i steps gets p1 . We start
by pointing p2 to the list head, and advance both pointers in parallel till p1 arrives at tail.
At that time point, p2 exactly points to the i-th element from right. Figure 1.2 shows
this idea. As p1 , p2 form a window, this method is also called ‘sliding window’ solution.
p2 p1
p2 p1
(b) When p1 reaches the tail, p2 points to the i-th element from right.
1: function Last-At(i, L)
1.3. BASIC OPERATIONS 25
2: p←L
3: while i > 0 do
4: L ← Rest(L) ▷ Raise error if out of bound
5: i←i−1
6: while Rest(L) 6= NIL do
7: L ← Rest(L)
8: p ← Rest(p)
9: return First(p)
The functional implementation need special consideration as we cannot update point-
ers directly. Instead, we advance two lists X = [x1 , x2 , ..., xn ] and Y = [xi , xi+1 , ..., xn ]
simultaneously, where Y is the sub-list without the first i − 1 elements.
• If Y is a singleton list, i.e. [xn ], then the last i-th element is the head of X;
• Otherwise, we drop the first element from both X and Y , then recursively check X ′
and Y ′ .
Function drop(m, X) discards the first m elements from list X. It can be implemented
by advancing X by m steps:
drop(0, X) = X
drop(m, ∅) = ∅ (1.9)
drop(m, x : xs) = drop(m − 1, xs)
Exercise 1.4
1. In the Init algorithm, can we use Append(L′ , First(L)) instead of ‘cons’?
2. How to handle empty list or out of bound index error in Last-At algorithm?
1.3.4 Mutate
Mutate operations include append, insert, update, and delete. Some functional environ-
ments actually implement mutate by creating a new list, while the original one is persisted
for later reuse, or released at sometime (chapter 2 in [3]).
Append
Append is the symmetric operation of cons, it adds element on the tail instead of head.
Because of this, it is also called ‘snoc’. For linked-list, it means we need traverse to the
tail, hence it takes O(n) time, where n is the length. To avoid repeatedly traverse, we
can record the tail reference as a variable, and keep updating it upon changes.
append(∅, x) = [x]
(1.10)
append(y : ys, x) = y : append(ys, x)
• Otherwise, we firstly recursive append x to the rest sub-list, then prepend the
original head to form the result.
Exercise 1.5
1. Add a ‘tail’ field in list definition, optimize the append algorithm to constant time.
2. With the additional ‘tail’ field, when need we update the tail variable? How does
it affect the performance?
Set value
Similar to getAt, we need advance to the target position, then change the element there.
To define function setAt(i, x, L):
• Otherwise, we need recursively set the value at position i − 1 for the sub-list L′ .
setAt(0, x, y : ys) = x : ys
(1.11)
setAt(i, x, y : ys) = y : setAt(i − 1, x, ys)
Exercise 1.6
1. Handle the empty list and out of bound error for setAt.
1.3. BASIC OPERATIONS 27
insert
There are two different cases about insertion. One is to insert an element at a given
position: insert(i, x, L). The algorithm is similar to setAt; The other is to insert an
element to a sorted list, and keep the order still sorted.
To insert x at position i, we need firstly advance i steps, then construct a new sub-list
with x as the head, then concatenate it to the first i elements4 .
insert(0, x, L) = x:L
(1.12)
insert(i, x, y : ys) = x : insert(i − 1, x, ys)
When i exceeds the list length, we can treat it as to append x. We leave this as an
exercise. The following is the corresponding iterative implementation:
1: function Insert(i, x, L)
2: if i = 0 then
3: return Cons(x, L)
4: H←L
5: p←L
6: while i > 0 and L 6= NIL do
7: p←L
8: L ← Rest(L)
9: i←i−1
10: Rest(p) ← Cons(x, L)
11: return H
If the list L = [x1 , x2 , ..., xn ] is sorted, i.e. for any position 1 ≤ i ≤ j ≤ n, then
xi ≤ xj holds. Here ≤ is abstract ordering. It can actually mean ≥ for descending order,
or subset relationship etc. We can design the insert algorithm to maintain the sorted
order. To insert element x to a sorted list L:
insert(x, ∅) = [x]
{
x≤y: x : y : ys (1.13)
insert(x, y : ys) =
otherwise : y : insert(x, ys)
Since the algorithm need compare elements one by one, it is bound to O(n) time,
where n is the length. Below is the corresponding iterative implementation:
1: function Insert(x, L)
2: if L = NIL or x < First(L) then
3: return Cons(x, L)
4: H←L
5: while Rest(L) 6= NIL and First(Rest(L)) < x do
6: L ← Rest(L)
4i starts from 0.
28 CHAPTER 1. LIST
sort(∅) = ∅
(1.14)
sort(x : xs) = insert(x, sort(xs))
This is a recursive algorithm. It firstly sorts the sub-list, then inserts the first element
in it. We can eliminate the recursion to develop a iterative implementation. The idea is
to scan the list, and one by one insert them:
1: function Sort(L)
2: S ← NIL
3: while L 6= NIL do
4: S ← Insert(First(L), S)
5: L ← Rest(L)
6: return S
At any time during the loop, the result is sorted. There is a major difference between
the recursive and the iterative implementations. The recursive one processes the list
from right, while the iterative one is from left. We’ll introduce ‘tail-recursion’ in section
1.3.5 to eliminate this difference. Chapter 3 introduces insertion sort in detail, including
performance analysis and optimization.
Exercise 1.7
1. Handle the out-of-bound case in insertion, and treat it as append.
2. Design the insertion algorithm for array. When insert at position i, all elements
after i need shift to the end by one.
3. Implement the insertion sort only with less than (<) defined.
delete
Symmetric to insert, delete also has two cases. One is to delete the element at a position;
the other is to look up, then delete the element of a given value. The first case is defined
as delAt(i, L), the second case is defined as delete(x, L).
To delete the element at position i, we need advance i steps to the target position,
then by pass the element, and link the rest sub-list.
• Otherwise, recursively delete the (i − 1)-th element from L′ , then prepend the orig-
inal head as the result.
delAt(i, ∅) = ∅
delAt(0, x : xs) = xs (1.15)
delAt(i, x : xs) = x : delAt(i − 1, xs)
This algorithm is bound to O(i) as we need advance i steps to perform deleting. Below
is the iterative implementation:
1: function Del-At(i, L)
1.3. BASIC OPERATIONS 29
Exercise 1.8
1. Design the algorithm to find and delete all occurrences of a given value.
2. Design the delete algorithm for array, all elements after the delete position need
shift to front by one.
concatenate
Append is a special case for concatenation. Append only adds one element, while concate-
nation adds multiple ones. However, the performance would be quadratic if repeatedly
appending as below:
X+ +∅ = X
(1.17)
X+
+ (y : ys) = append(X, y) +
+ ys
30 CHAPTER 1. LIST
We can further improve it a bit: when Y is empty, we needn’t traverse, but directly
return X:
∅+ +Y = Y
X+ +∅ = X (1.18)
(x : xs) +
+Y = x : (xs +
+Y)
The modified algorithm only traverse list X, then link its tail to Y , hence it is bound
O(|X|) time. In imperative settings, concatenation can be realized in constant time with
the additional tail variable. We leave its implementation as exercise. Below is the iterative
implementation without using the tail variable:
1: function Concat(X, Y )
2: if X = NIL then
3: return Y
4: if Y = NIL then
5: return X
6: H←X
7: while Rest(X) 6= NIL do
8: X ← Rest(X)
9: Rest(X) ← Y
10: return H
sum(∅) = 0
(1.19)
sum(x : xs) = x + sum(xs)
product(∅) = 1
(1.20)
product(x : xs) = x · product(xs)
Both algorithms traverse the list, hence are bound to O(n) time, where n is the length.
1.3. BASIC OPERATIONS 31
Curried form was introduced by Schönfinkel (1889 - 1942) in 1924, then widely used by
Haskell Curry from 1958. It is known as Currying[73]. For a function taking 2 parameters
f (x, y), when pass one argument x, it ends up to another function of y: g(y) = f (x, y)
or g = f x. We can further extend it to multiple variables, that f (x, y, ..., z) can be
Curried to a series of functions: f, f x, f x y, .... No matter how many variables, we can
treat them as a series of Curried function, each has only one parameter: f (x, y, ..., z) =
f (x)(y)...(z) = f x y ... z.
The accumulated sum does not only calculate the result from left to right, it needn’t
book keeping any context, state, or intermediate result for recursion. All such states are
either passed as argument (i.e. A), or can be dropped (the previous element in the list).
Such recursive calls are often optimized as pure loops in practice. We call this kind of
function as tail recursion (or ‘tail call’), and the optimization to eliminate recursion is
called ’tail recursion optimization’[61], because the recursion happens at the tail place in
the function. The performance of tail call can be greatly improved after optimization,
and we can avoid the issue of stack overflow in deep recursions.
In section 1.3.4 about insertion sort, we mentioned the recursive algorithm sorts ele-
ments form right. We can also optimize it to tail call:
sort′ (A, ∅) = A
(1.23)
sort′ (A, x : xs) = sort′ (insert(x, A), xs)
And the sort is defined in Curried form with ∅ as the start value:
As a typical tail call problem, let’s consider how to compute bn effectively? (refer to
problem 1.16 in [63].) A brute-force solution is to repeatedly multiplying b for n times
from 1. This algorithm is bound to O(n):
1: function Pow(b, n)
2: x←1
3: loop n times
32 CHAPTER 1. LIST
4: x←x·b
5: return x
Actually, the solution can be greatly improved. When compute b8 , after the first 2
loops, we get x = b2 . At this stage, we needn’t multiply x with b to get b3 , but directly
compute x2 , which gives b4 . If do this again, we get (b4 )2 = b8 . Thus we only need loop
3 times, but not 8 times.
Based on this idea, if n = 2m for some none negative integer m, we can design below
algorithm to compute bn :
b1 = b
n
bn = (b 2 )2
We next extend this divide and conquer method for any none negative integer n:
• If n = 0, define b0 = 1;
n
• If n is even, we halve n, to compute b 2 . Then square it;
b0 = 1{
n
2|n : (b 2 )2 (1.25)
bn =
otherwise : b · bn−1
However, the 2nd clause blocks us to turn it tail recursive. Alternatively, we can
square the base number, and halve the exponent.
b0 = 1{
n
2|n : (b2 ) 2 (1.26)
bn =
otherwise : b · bn−1
With this change, we can develop a tail recursive algorithm to compute bn = pow(b, n, 1).
pow(b, 0, A) = A
{ n
2|n : pow(b2 ,
, A) (1.27)
pow(b, n, A) = 2
otherwise : pow(b, n − 1, b · A)
Compare to the brute-force implementation, this one improves to O(lg n) time. Ac-
tually, we can improve it further. If represent n in binary format n = (am am−1 ...a1 a0 )2 ,
i
we clearly know that the computation for b2 is necessary if ai = 1. This is quite similar
to the idea of Binomial heap (section 10.2). We can multiplying all of them for bits of 1.
3
For example, when compute b11 , as 11 = (1011)2 = 23 + 2 + 1, thus b11 = b2 × b2 × b.
We get the result by these steps:
1. calculate b1 , which is b;
Finally, we multiply the result of step 1, 2, and 4 to get b11 . Summarize this idea, we
improve the algorithm as below.
pow(b, 0, A) = A
2|n : n
pow(b2 ,
, A) (1.28)
pow(b, n, A) = 2n
otherwise : pow(b2 , b c, b · A)
2
This algorithm essentially shifts n to right 1 bit each time (divide n by 2). If the
LSB (Least Significant Bit, the lowest) is 0, n is even. It squares the base and keeps the
accumulator A unchanged; If the LSB is 1, n is odd. It squares the base and accumulates
it to A; When n is zero, we exhaust all bits, A is the final result. At any time, the
updated base number b′ , the shifted exponent number n′ , and the accumulator A satisfy
′
the invariant bn = A · (b′ )n .
Compare to previous implementation, which minus by one for odd n, this algorithm
halves n every time. It exactly runs m rounds, where m is the number of bits. We leave
the imperative implementation as exercise.
Back to the sum and product. The iterative implementation applies plus and multiply
while traversing:
1: function Sum(L)
2: s←0
3: while L 6= NIL do
4: s ← s+ First(L)
5: L ← Rest(L)
6: return s
7: function Product(L)
8: p←1
9: while L 6= NIL do
10: p ← p · First(L)
11: L ← Rest(L)
12: return p
One interesting usage of product is to calculate factorial of n as: n! = product([1..n]).
and
max([x]) = {
x
x > max(xs) : x (1.30)
max(x : xs) =
otherwise : max(xs)
34 CHAPTER 1. LIST
Both process the list from right to left. We can modify them to tail recursive. It also
brings us the ‘on-line’ feature, that at any time, the accumulator is the min/max so far
processed. Use min for example:
min′ (a, ∅) = a
{
x<a: min′ (x, xs) (1.31)
min′ (a, x : xs) =
otherwise : min′ (a, xs)
Different from sum′ /prod′ , we can’t pass a fixed starting value to the tail recursive
min′ /max′ , unless we use ±∞ in below Curried form:
Alternatively, we can pass the first element as the accumulator given min/max only
takes none empty list:
min(x : xs) = min′ (x, xs) max(x : xs) = max′ (x, xs) (1.32)
The optimized tail recursive algorithm can be further changed to purely iterative
implementation. We give the Min example, and skip Max.
1: function Min(L)
2: m ← First(L)
3: L ← Rest(L)
4: while L 6= NIL do
5: if First(L) < m then
6: m ← First(L)
7: L ← Rest(L)
8: return m
There is a way to realize the tail recursive algorithm without using accumulator explic-
itly. The idea is to re-use the first element as the accumulator. Every time, we compare
the head with the next element; then drop the greater one for min, and drop the less one
for max.
min([x]) = x
{
x1 < x 2 : min(x1 : xs) (1.33)
min(x1 : x2 : xs) =
otherwise : min(x2 : xs)
Exercise 1.9
1. Change the length to tail call.
2. Change the insertion sort to tail call.
3. Implement the O(lg n) algorithm to calculate bn by represent n in binary.
1.4 Transform
From algebraic perspective, there are two types of transform: one keeps the list structure,
but only change the elements; the other alter the list structure, hence the result is not
isomorphic to the original list. Particularly, we call the former map.
1.4. TRANSFORM 35
toStr(∅) = ∅
(1.34)
toStr(x : xs) = str(x) : toStr(xs)
For the second example, consider a dictionary, which is a list of words grouped by
initial letter. Like:
Next we process a text (Hamlet for example), and augment each word with their
number of occurrence, like:
Now for every initial letter, we want to figure out which word occurs most. How
to write a program to do this work? The output is a list of words, that every one has
the most occurrences in the group, something like [a, but, can, ...]. We need
develop a program that transform a list of groups of word-number pairs into a list
of words.
First, we need define a function. It takes a list of word-number pairs, finds the word
paired with the biggest number. Sort is overkill. What we need is a special max function
maxBy(cmp, L), where cmp compares two elements abstractly.
maxBy(cmp, [x]) = x
{
cmp(x1 , x2 ) : maxBy(cmp, x2 : xs) (1.35)
maxBy(cmp, x1 : x2 : xs) =
otherwise : maxBy(cmp, x1 : xs)
Instead of embedded parenthesis f st((a, b)) = a, we omit one layer, and use a space.
Generally, we treat f x = f (x) when the context is clear. Then we can define a special
compare function for word-count pairs:
Then pass less to maxBy to finalize our definition (in Curried form):
With max′′ () defined, we can develop the solution to process the whole list.
solve(∅) = ∅
(1.39)
solve(x : xs) = f st(max′′ (x)) : solve(xs)
36 CHAPTER 1. LIST
Map
The solve() and toStr() functions reveal the same structure, although they are developed
for different problems. We can abstract this common structure as map:
map(f, ∅) = ∅
(1.40)
map(f, x : xs) = f (x) : map(f, xs)
map takes the function f as argument, applies it to every element to form a new list.
A function that computes with other functions is called high-order function. If the type
of f is A → B, which means it sends an element of A to the result of B, then the type of
map is:
We read it as: map takes a function of A → B, then convert a list [A] to another list
[B]. The two examples in previous section can be defined with map as (in Curried form):
Y = {f (x)|x ∈ X} (1.42)
For the iterative Map implementation, below algorithm uses a sentinel node to simplify
the logic to handle head reference.
1: function Map(f, L)
2: L′ ← Cons(⊥, NIL) ▷ Sentinel node
3: p ← L′
4: while L 6= NIL do
1.4. TRANSFORM 37
5: x ← First(L)
6: L ← Rest(L)
7: Rest(p) ← Cons(f (x), NIL)
8: p ← Rest(p)
9: return Rest(L′ ) ▷ Drop the sentinel
For each
Sometimes we only need to traverse the list, repeatedly process the elements one by one
without building the new list. Here is an example that print every element out:
1: function Print(L)
2: while L 6= NIL do
3: print First(L)
4: L ← Rest(L)
More generally, we can pass a procedure P , then traverse the list and apply P to each
element.
1: function For-Each(P, L)
2: while L 6= NIL do
3: P(First(L))
4: L ← Rest(L)
Examples
As an example, let’s see a “n-lights puzzle”[96]. There are n lights in a room, all of them
are off. We execute the following n rounds:
1. Switch all the lights in the room (all on);
2. Switch lights with number 2, 4, 6, ... , that every other light is switched, if the light
is on, it will be off;
3. Switch every third lights, number 3, 6, 9, ... ;
4. ...
And at the last round, only the last light (the n-th light) is switched. The question is
how many lights are on in the end?
Let’s start with a brute-force solution, then improve it step by step. We represent the
state of n lights as a list of 0/1 numbers. 0 is off, 1 is on. The initial state are all zeros:
[0, 0, ..., 0]. We label the light from 1 to n, then map them to (i, on/off) pairs:
lights = map(i 7→ (i, 0), [1, 2, 3, ...n])
It binds each number to zero, the result is a list of pairs: L = [(1, 0), (2, 0), ..., (n, 0)].
Next we operate this list of pairs for n rounds. In the i-th round, switch the second value
in this pair if its label is divided by i. Consider 1 − 0 = 1, and 1 − 1 = 0, we can switch
0/1 value of x by 1 − x. For light (j, x), if i|j, (i.e. j mod i = 0), then switch, otherwise
leave the light untouched.
{
j mod i = 0 : (j, 1 − x)
switch(i, (j, x)) = (1.44)
otherwise : (j, x)
Run this program from 1 light to 100 lights, let’s see what the answers are (we added
line breaks):
[1,1,1,
2,2,2,2,2,
3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,10]
This result is interesting:
• the first 3 answers are 1;
• the 4-th to the 8-th answers are 2;
• the 9-th to the 15-th answers are 3;
• ...
It seems that the i2 -th to the ((i + 1)2 − 1)-th answers are i. Actually, we can prove
it:
Proof. Given n lights labeled from 1 to n, consider which lights are on finally. Since the
initial states for all lights are off, we can say that, the lights which are manipulated odd
times are on. For every light i, it will be switched at the j round if i can be divided by j
(denote as j|i). Only the lights which have odd number of factors are on in the end.
The key point to solve this puzzle, is to find all numbers which have odd number of
factors. For any positive integer n, let S be the set of all factors of n. S is initialized to
∅. If p is a factor of n, there must exist a positive integer q such that n = pq holds. It
means q is also a factor of n. We add 2 different factors to set S if and only if p 6= q,
which keeps |S| even all the time unless p = q. In such case, n is a square number. We
can only add 1 factor to set S, which leads to odd number of factors.
1.4. TRANSFORM 39
At this stage, we can design a fast solution by finding the number of square numbers
under n.
√
solve(n) = b nc (1.48)
Below Haskell example program outputs the answer for 1, 2, ..., 100 lights:
map (floor ◦ sqrt) [1..100]
Map is a generic concept does not limit to list. It can be applied to many complex
algebraic structures. The next chapter about binary search tree explains how to map on
trees. As long as we can traverse the structure, and the empty is defined, we can use the
same mapping idea.
1.4.2 reverse
It’s a classic exercise to reverse a singly linked-list with minimum space. One must
carefully manipulate the node reference, however, there exists easy method to implement
reverse:
reverse(∅) = ∅
(1.49)
reverse(x : xs) = append(reverse(xs), x)
However, the performance is poor. As it need traverse to the end to append, this
algorithm is bound to quadratic time. We can optimize it with tail call, use an ac-
cumulator to store the reversed part so far. We initialize the accumulator as empty:
reverse = reverse′ (∅).
reverse′ (A, ∅) = A
(1.50)
reverse′ (A, x : xs) = reverse′ (x : A, xs)
Different from appending, cons (:) is a constant time operation. The idea is to re-
peatedly take the elements from the head, and prepend them to the accumulator. It
essentially likes to store elements in a stack, then pop them out. The overall performance
is O(n), where n is the length. Since tail call need not keep the context, we can optimize
it to purely iterative loops:
1: function Reverse(L)
2: A ← NIL
3: while L 6= NIL do
4: A ← Cons(First(L), A)
5: L ← Rest(L)
6: return A
However, this algorithm creates a new reversed list, but not mutate the original one.
We need change it to in-place mutate L as the below example program:
40 CHAPTER 1. LIST
Exercise 1.10
1. Given a number from 0 to 1 billion, write a program to give its English represen-
tation. For e.g. 123 is ‘one hundred and twenty three’. What if there is decimal
part?
2. Implement the algorithm to find the maximum value in a list of pairs [(k, v)] in
tail call.
1.5 Sub-list
Different from array which is capable to slice a continuous segment fast, it typically need
linear time to traverse and extract sub-list.
take(0, L) = ∅
take(n, ∅) = ∅ (1.51)
take(n, x : xs) = x : take(n − 1, xs)
This algorithm handles the out of bound case like this: if n > |L| or n is negative, it
ends up to the edge case that L becomes empty, hence returns the whole list as the result.
Drop, on the other hand, discards the first n elements and returns the rest. It is
equivalent to slice the sub-list from right: sublist(n + 1, |L|, L), where |L| is the length.
Its implementation is symmetric:
drop(0, L) = L
drop(n, ∅) = ∅ (1.52)
drop(n, x : xs) = drop(n − 1, xs)
We leave the imperative implementation for take/drop as exercise. As the next step,
we can develop a algorithm to extract sub-list at any position for a given length:
The boundary is defined as [f rom, to]. It includes both ends. We can also split a list
at a given position:
Exercise 1.11
1. Define sublist and slice in Curried Form without L as parameter.
and we can define break with span by negating the predication in Curried form:
Both span and break find the longest prefix. They stop immediately when the con-
dition does not meet and ignores the rest. Below is the iterative implementation for
span:
42 CHAPTER 1. LIST
1: function Span(p, L)
2: A ← NIL
3: while L 6= NIL and p(First(L)) do
4: A ← Cons(First(L), A)
5: L ← Rest(L)
6: return (A, L)
This algorithm creates a new list to hold the longest prefix, another option is to reuse
the original list and break it in-place:
1: function Span(p, L)
2: A←L
3: tail ← NIL
4: while L 6= NIL and p(First(L)) do
5: tail ← L
6: L ← Rest(L)
7: if tail = NIL then
8: return (NIL, L)
9: Rest(tail) ← NIL
10: return (A, L)
group
span breaks list into two parts, group divides list into multiple sub-lists. For example,
we can use group to change a long word into small units, each contains consecutive same
characters:
group ``Mississippi'' = [``M'', ``i'', ``ss'', ``i'',
``ss'',``i'', ``pp'', ``i'']
where (ys : yss) = group(∼, xs). This algorithm is bound to O(n) time, where n is
the length. We can also implement the iterative group algorithm. For the none empty
list L, we initialize the result groups as [[x1 ]], where x1 is the first element. We scan the
list from the second one, append it to the last group if the two consecutive elements are
‘equivalent’; otherwise we start a new group.
1.5. SUB-LIST 43
1: function Group(∼, L)
2: if L = NIL then
3: return [NIL]
4: x ← First(L)
5: L ← Rest(L)
6: g ← [x]
7: G ← [g]
8: while L 6= NIL do
9: y ← First(L)
10: if x ∼ y then
11: g ← Append(g, y)
12: else
13: g ← [y]
14: G ← Append(G, g)
15: x←y
16: L ← Next(L)
17: return G
However, this program performs in quadratic time if the append isn’t optimized with
the tail reference. If don’t care the order, we can alternatively change append to cons.
With the group algorithm defined, we can realize the above 2 cases as below:
group(=, [m, i, s, s, i, s, s, i, p, p, i]) = [[M ], [i], [ss], [i], [ss], [i], [pp], [i]]
and
group(≥, [15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1, 4, 8, 3, 14, 2])
= [[15, 9, 0], [12, 11, 7], [10, 5], [6], [13, 1], [4], [8, 3], [14, 2]]
Another method to implement group is to use the span function. Given a predication,
span breaks the list into two parts: the longest sub-list satisfies the condition, and the
rest. We can repeatedly apply span to the rest part till it becomes empty. However, the
predication passed to span is an unary function. It takes an element and tests it. While
in group, the predication is a binary function. It takes two elements and compares. We
can use Currying: to pass and fix the first element in the binary predication, then use the
Curried function to test the other.
group(∼, ∅) = [∅]
(1.61)
group(∼, x : xs) = (x : A) : group(∼, B)
Where (A, B) = span(y 7→ x ∼ y, xs) is the span result applied to the rest sub-list.
Although this new group function generates the correct result for string case:
group (==) ``Mississippi''
[``m'', ``i'', ``ss'', ``i'', ``ss'', ``i'', ``pp'', ``i'']
When the first number 15 is used as the left hand of ≥, it is the maximum value, hence
span ends with putting all elements to A, and leaves B empty. It is not a defect, but
the correct behavior, because group is defined to put equivalent elements together. To be
accurate, the equivalent relation (∼) needs satisfy three things: reflexive, transitive, and
symmetric.
When group “Mississippi”, we use the equal (=) operator. It conforms the three
rules, and generates the correct result. However, when pass Curried (≥) predication for
numbers, it violets both reflexive and symmetric rules, hence generates unexpected result.
The second algorithm using span, limits its use case to strictly equality; while the first
algorithm does not. It only tests the predication for every two elements matches, which
is weaker than equality relation.
Exercise 1.12
1. Change the take/drop algorithm, such that when n is negative, returns ∅ for take,
and the whole list for drop.
2. Implement the in-place imperative take/drop algorithms.
3. Implement the iterative ‘take while’ and ‘drop while’ algorithms.
4. Consider the below span implementation:
span(p, ∅) = { ∅)
(∅,
p(x) : (x : A, B)
span(p, x : xs) =
otherwise : (A, x : B)
where (A, B) = span(p, xs). What is the difference between this one and the
algorithm we defined previously?
1.6 Fold
We’ve seen most list algorithms share some common structure. This is not by chance.
Such commonality is rooted from the recursive nature of list. We can abstract the list
algorithms to a higher level concept, fold5 , which is essentially the initial algebra of all
list related computation[99].
h(∅) = z
(1.62)
h(x : xs) = x ⊕ h(xs)
• The result for empty list. It is 0 for sum, 1 for product, and ∅ for sort.
• The binary operation applies to the head and the recursive result. It is plus for
sum, multiply for product, and ordered-insertion for sort.
5 also known as reduce
1.6. FOLD 45
We abstract the result for empty list as the initial value, denoted as z to mimic the
generic zero concept. The binary operation as ⊕. The above definition can be then
parameterized as:
h(⊕, z, ∅) = z
(1.63)
h(⊕, z, x : xs) = x ⊕ h(⊕, z, xs)
Let’s feed it a list L = [x1 , x2 , ..., xn ], and expand to see how it behaves like:
h(⊕, z, [x1 , x2 , ..., xn ])
= x1 ⊕ h(⊕, z, [x2 , x3 , ..., xn ])
= x1 ⊕ (x2 ⊕ h(⊕, z, [x3 , ..., xn ]))
...
= x1 ⊕ (x2 ⊕ (...(xn ⊕ h(⊕, z, ∅))...))
= x1 ⊕ (x2 ⊕ (...(xn ⊕ z)...))
We need add the parentheses, because the computation starts from the right-most
(xn ⊕ z). It repeatedly folds to left towards x1 . This is quite similar to a fold-fan in figure
1.3. Fold-fan is made of bamboo and paper. Multiple frames stack together with an axis
at one end. The arc shape paper is fully expanded by these frames; We can close the fan
by folding the paper. It ends up as a stick.
We can consider the fold-fan as a list of bamboo frames. The binary operation is to
fold a frame to the top of the stack. The initial stack is empty. To fold the fan, we start
from one end, repeatedly apply the binary operation, till all the frames are stacked. The
sum and product algorithms do the same thing like folding fan.
sum([1, 2, 3, 4, 5]) = 1 + (2 + (3 + (4 + 5)))
= 1 + (2 + (3 + 9))
= 1 + (2 + 12)
= 1 + 14
= 15
product([1, 2, 3, 4, 5]) = 1 × (2 × (3 × (4 × 5)))
= 1 × (2 × (3 × 20))
= 1 × (2 × 60)
= 1 × 120
= 120
We name this kind of process fold. Particularly, since the computation starts from
the right end, we denote it f oldr:
f oldr(f, z, ∅) = z
(1.64)
f oldr(f, z, x : xs) = f (x, f oldr(f, z, xs))
46 CHAPTER 1. LIST
Or in Curried form: sum = f oldr(+, 0), product = f oldr(×, 1). We can also define
the insertion sort with f oldr as:
f oldl(f, z, ∅) = z
(1.68)
f oldl(f, z, x : xs) = f oldl(f, f (z, x), xs)
Use sum for example, we can see how the computation is expanded from left to right:
Here we delay the evaluation of f (z, x) in every step. This is the behavior for lazy-
evaluation. Otherwise, they will be evaluated in sequence of [1, 3, 6, 10, 15] in each call.
Generally, we can expand f oldl as:
Or express as infix:
f oldl is tail recursive. We can implement it with loops. We initialize the result as
z, then apply the binary operation on top of it with every element. It is typically called
Reduce in most imperative environment.
1: function Reduce(f, z, L)
2: while L 6= NIL do
3: z ← f (z, First(L) )
4: L ← Rest(L)
5: return z
Both f oldr and f oldl have their own suitable use cases. They are not always exchange-
able. For example, some container only allows to add element in one end (like stack). We
can define a function fromList to build such a container from a list (in Curried form):
Where empty is the empty container. The singly linked-list is such a container. It
performs well when add element to the head, but poorly when append to tail. f oldr is
a natural choice when duplicate a list while keep the order. But f oldl will generate a
reversed list. As a workaround, to implement the iterative reducing from right, we can
first reverse the list, then reduce it:
1: function Reduce-Right(f, z, L)
2: return Reduce(f, z, Reverse(L))
One may think f oldl should be the preferred one as it is optimized with tail call,
hence fits for both functional and imperative settings. It is also the online algorithm that
always holds the result so far. However, f oldr plays a critical role when handling infinite
list (modeled as stream) with lazy evaluation. For example, below program wraps every
natural number to a singleton list, and returns the first 10:
It does not work with f oldl because the outer most evaluation never ends. We use a
unified notation f old when either left or right works. In this book, we also use f oldl and
f oldr to emphasis folding over the direction. Although this chapter is about list, the fold
concept is generic. It can be applied to other algebraic structures. We can fold a tree (2.6
in [99]), a queue, and many other things as long as they satisfy the following 2 criteria:
• We can decompose the recursive structure (like decompose tree into sub-trees and
key).
People abstract them further with concepts like foldable, monoid, and traversable.
Exercise 1.13
1. To define insertion-sort with f oldr, we designe the insert function as insert(x, L),
such that it can be expressed as sort = f oldr(insert, ∅). The type for f oldr is:
f oldr :: (A → B → B) → B → [A] → B
Where its first parameter f has the type A → B → B, the initial value z has
the type B. It folds on a list of A, and builds the result of B. How to define the
insertion-sort with f oldl? What is the type signature of f oldl?
1.6.3 example
As an example, let’s see how to implement the n-lights puzzle with f old and map. In the
brute-force solution, we create a list of pairs. Each pair (i, s) has a number i, and on/off
state s. Every round j, we scan the lights, toggle the i-th switch when the j divides the
i. We can define this process with f old:
f oldr (step, [(1, 0), (2, 0), ..., (n, 0)], [1, 2, ..., n])
As the initial state, all lights are off. We fold on the list of round numbers from 1
to n. Function step takes two parameters: the round number i, and the list of pairs. It
performs switching through map:
f oldr ((i, L) 7→ map(switch(i), L), [(1, 0), (2, 0), ..., (n, 0)], [1, 2, ..., n])
48 CHAPTER 1. LIST
The f oldr result is the pairs of final on/off state, we next extract the state from each
through map, and count the number with sum:
concatenate
What if we apply f old on “+ +” (section 1.3.4) for a list of lists? It concatenates them to
a long list, just like sum to numbers.
+, ∅)
concat = f oldr (+ (1.72)
Exercise 1.14
1.7.1 Exist
Given some a of type A, and a list of A, how to test if x is in the list? The idea is to
compare every element in the list with a, until either they are equal, or reach to the end:
a∈∅ = F {alse
b = a : T rue (1.73)
a ∈ (b : bs) =
b 6= a : a ∈ bs
This algorithm is also called elem. It bounds to O(n) where n is the length. If the list
is ordered (ascending for example), one may want to improve the algorithm to logarithm
time with the idea of divide and conquer. However, list does not support random access,
we can’t apply binary search. See chapter 3 for details.
1.7. SEARCH AND FILTER 49
1.7.2 Look up
Let’s extend elem a bit. In the n-lights puzzle, we use a list of pairs [(k, v)]. Every pair
contains a key and a value. This kind of list is called ‘associate list’ (or assoc list). If
want to look up a given value in such list, we need extract some part (the value) for
comparison.
lookup(x, ∅) = {
N othing
v = x : Just (k, v) (1.74)
lookup(x, (k, v) : kvs) =
v 6= x : lookup(x, kvs)
Different from elem, we do not return true/false. Instead, we want to return the pair
of key-value when find. However, it is not guaranteed the value always exists. We use
an algebraic type called ‘Maybe’. A type of Maybe A has two different kinds of value. It
maybe some a in A of nothing. Denoted as Just a or N othing. This is the way to deal
with null reference issues(4.2.2 in [99]).
f ind(p, ∅) = N
{ othing
p(x) : Just x (1.75)
f ind(p, (x : xs)) =
otherwise : f ind(p, xs)
Although there can be multiple elements match, the f ind algorithm picks the first.
We can expand it to find all elements. It is often called f ilter as demonstrated in figure
1.4.
Figure 1.4: Input: [x1 , x2 , ..., xn ], Output: [x′1 , x′2 , ..., x′m ]. and ∀x′i ⇒ p(x′i ).
Different from f ind, when there is no element satisfies the predicate, f ilter returns
the empty list. It scans to examine every element one by one:
f ilter(p, ∅) = ∅
{
p(x) : x : f ilter(p, xs) (1.77)
f ilter(p, x : xs) =
otherwise : f ilter(p, xs)
This definition builds the result from right to left. For iterative implementation, if
build the result with append, it will degrade to O(n2 ).
1: function Filter(p, L)
2: L′ ← NIL
3: while L 6= NIL do
4: if p(First(L)) then
5: L′ ← Append(L′ , First(L)) ▷ Linear time
50 CHAPTER 1. LIST
6: L ← Rest(L)
The right way is to use cons instead, however, it builds the result in the reversed
order. We can further reverse it within linear time (see the exercise). The nature to build
result from right indicates that we can define filter in f oldr. We need define a function f
to test an element against the predicate, if OK, prepend to the result:
{
p(x) : x:A
f (x, A) = (1.78)
otherwise : A
We also need pass the predicate p to f . There are actually 3 parameters as f (p, x, A).
Filter is defined in f oldr with a Curried form of f :
Filter is also a generic concept not only limit to list. We can apply a predicate on any
traversable structures to extract the result.
1.7.4 Match
Match is to find a pattern among some structure. Even if we limit to list and string, there
are still too many things to cover. We have dedicated chapters about string matching.
This section deals with the problem, that given a list A, and test if it exits in another list
B. There are two special cases: to test if A is prefix or suffix of B. The span algorithm
in (1.58) actually finds a prefix under a certain condition. We can do similar things: to
compare each element between A and B from left till meet any different one or reach the
end of either list. Define A ⊆ B if A is prefix of B:
∅⊆B = T rue
(a : as) ⊆ ∅ = F
{alse
(1.81)
a 6= b : F alse
(a : as) ⊆ (b : bs) =
a = b : as ⊆ bs
Prefix testing takes linear time as it scans the lists. However, we can not do suffix
testing in this way because it is hard to start from the aligned right ends, and scan
backwards for lists. This is different from array. Alternatively, we can reverse both lists
in linear time, hence change the problem to prefix testing:
With ⊆ defined, we can test if a list is the sub-list of another one. We call it infix
testing. The idea is to scan the target list, and repeatedly applying the prefix testing:
For the edge case that A is empty, we define empty is infix of any list. Because ∅ ⊆ B
is always true, it gives the right result. It also evaluates inf ix?(∅, ∅) correctly. Below is
the corresponding iterative implementation:
1.8. ZIP AND UNZIP 51
1: function Is-Infix(A, B)
2: if A = NIL then
3: return TRUE
4: n ← |A|
5: while B 6= NIL and n ≤ |B| do
6: if A ⊆ B then
7: return TRUE
8: B ← Rest(B)
9: return FALSE
Because prefix testing runs in linear time, and it is called in the loop of scan. This
algorithm is bound to O(nm), where m, n are the length of the two lists respectively. It
is an interesting problem to improve this ‘position by position’ scan algorithm to linear
time, even when we apply it to arrays. Chapter 13 introduces some smart methods,
like the Knuth-Morris-Pratt (KMP) algorithm and Boyer-Moore algorithm. Appendix C
introduces another method called suffix-tree.
In a symmetric way, we can enumerate all suffixes of B, and check if A is prefix of any
of them:
inf ix?(A, B) = ∃S ∈ suffixes(B), A ⊆ S (1.84)
This can be implemented with list comprehension as below example Haskell program:
isInfixOf a b = (not ◦ null) [ s | s ← tails(b), a `isPrefixOf` s]
Where function isPrefixOf does the prefixing testing, tails generates all suffixes
of a given list. We left its implementation as an exercise.
Exercise 1.15
1. Implement the linear time existence testing algorithm.
2. Implement the iterative look up algorithm.
3. Implement the linear time filter algorithm through reverse.
4. Implement the iterative prefix testing algorithm.
5. Implement the algorithm to enumerate all suffixes of a list.
zip build the result from right. We can also define it with f oldr. It is bound to O(m)
time, where m is the length of the shorter list. When implement the iterative zip, the
performance will drop to quadratic if using append, unless with the reference to the tail
position.
1: function Zip(A, B)
2: C ← NIL
3: while A 6= NIL and B 6= NIL do
4: C ← Append(C, (First(A), First(B))) ▷ Linear time
5: A ← Rest(A)
6: B ← Rest(B)
7: return C
To avoid append, we can use ’cons’ then reverse the result. However, it can not deal
with two infinite lists. In imperative settings, we can also re-use A to store the result
(treat it as transform a list of elements to a list of pairs).
We can extend to zip multiple lists to one. Some programming libraries provide, zip,
zip3, zip4, ..., till zip7. Sometimes, we don’t want to build a list of pairs, but apply
a combinator function. For example, given a list of unit prices [1.00, 0.80, 10.05, ...] for
fruits: apple, orange, banana, ... When customer has a list of quantities, like [3, 1, 0, ...],
means this customer, buys 3 apples, 1 orange, 0 banana, ... Below program generates a
payment list:
pays(U, ∅) = ∅
pays(∅, Q) = ∅
pays(u : us, q : qs) = (u · q) : pays(us, qs)
It is same as the zip function except uses multiply but not ’cons’ to combine elements.
We can abstract the combinator as a function f , and pass it to zip to build a generic
algorithm:
zipW ith(f, A, ∅) = ∅
zipW ith(f, ∅, B) = ∅ (1.86)
zipW ith(f, a : as, b : bs) = f (a, b) : zipW ith(f, as, bs)
Here is an example that defines the inner-product (or dot-product)[98] through zipW ith:
unzip is the inverse operation of zip. It converts a list of pairs to two separated lists.
Below is its definition with f oldr in Curried form:
We fold from a pair of empty lists, break a, b from the pairs and prepend them to the
two intermediate lists respectively. We can also use f st and snd explicitly as:
For the fruits example, suppose the unit price is stored in a assoc list: U = [(apple, 1.00), (orange, 0.8
for lookup, for example lookup(melon, U ). The purchase quantity is a assoc list: Q =
1.8. ZIP AND UNZIP 53
[(apple, 3), (orange, 1), (banana, 0), ...]. How to calculate the total payment? The straight
forward way is to extract the unit price and the quantity lists, then compute their inner-
product:
As an example, let’s see how to use zipW ith to define infinite Fibonacci numbers with
lazy evaluation:
Where F is the infinite list of Fibonacci numbers, starts from 0 and 1. F ′ is the rest
Fibonacci numbers without the first one. From the third, every Fibonacci number is the
sum of numbers from F and F ′ at the same position. Below example program list the
first 15 Fibonacci numbers:
fib = 0 : 1 : zipWith (+) fib (tail fib)
take 15 fib
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377]
zip and unzip are generic. We can expand to zip two trees, where the nodes contain
paired elements from both. When traverse a collection of elements, we can also use the
generic zip and unzip to track the path, this is a method to mimic the ‘parent’ reference
in imperative implementation (last chapter of [10]).
Exercise 1.16
1. Design the iota (I) algorithm for below usages:
• iota(..., n) = [1, 2, 3, ..., n];
• iota(m, n) = [m, m + 1, m + 2, ..., n], where m ≤ n;
• iota(m, m + a, ..., n) = [m, m + a, m + 2a, ..., n];
• iota(m, m, ...) = repeat(m) = [m, m, m, ...];
• iota(m, ...) = [m, m + 1, m + 2, ...].
The last two cases are about infinite list. One possible implementation is through
streaming and lazy evaluation ([63] and [10]).
2. Implement the linear time imperative zip algorithm
3. Define zip with f oldr.
4. For the fruits example, suppose the quantity assoc list only contains the items with
none-zero quantity. i.e. instead of
but
Q = [(apple, 3), (orange, 1), ...]
because customer does not buy banana. Design a program to calculate the total
payment.
5. Implement lastAt with zip.
54 Binary search tree
Exercise 1.17
1. Design algorithm to remove the duplicated elements in a list. For imperative
implementation, the elements should be removed in-place. The original element
order should be maintained. What is the complexity of this algorithm? How to
simplify it with additional data structure?
2. List can represent decimal non-negative integer. For example 1024 as list is 4 →
2 → 0 → 1. Generally, n = dm ...d2 d1 can be represented as d1 → d2 → ... → dm .
Given two numbers a, b in list form. Realize arithmetic operations such as add
and subtraction.
3. In imperative settings, a circular linked-list is corrupted, that some node points
back to previous one, as shown in figure 1.5. When traverse, it falls into infinite
loops. Design an algorithm to detect if a list is circular. On top of that, improve
it to find the node where loop starts (the node being pointed by two precedents).
2.1 Introduction
Array and list are typically considered the basic data structures. However, we’ll see they
are not necessarily easy to implement in chapter 12. Upon imperative settings, array is
the most elementary data structures. It is possible to implement linked-list using arrays
(Equation 3.4). While in functional settings, linked-list acts as the building blocks to
create array and other data structures.
We start from Binary Search Trees as the first data structure. Let us see an interesting
programming problem given by Bentley in Programming Pearls[2]. It is about to count
the number of words in text. Here is an example solution:
void wordcount(Input in) {
bst<string, int> map;
while string w = read(in) {
map[w] = if map[w] == null then 1 else map[w] + 1
}
for var (w, c) in map {
print(w, ":", c)
}
}
The map is a binary search tree. Here we use the word as the key, and its occurrence
number as the value. This program runs fast, which reflects the power of binary search
tree. Before dive into it, let us first see the more generic tree, the binary tree. A binary
tree can be defined recursively. It is
• either empty;
• or contains 3 parts: the element, and two sub-trees called left and right children.
• For any node, all the keys in its left sub-tree are less than the key in this node;
1 It is abstract ordering, not limit to magnitude, but like precedence, subset of etc. the ‘less than’ (<)
55
56 CHAPTER 2. BINARY SEARCH TREE
L R
4 10
14 7 9 3
2 8 1
• the key in this node is less than any key in its right sub-tree.
Figure 2.2 shows an example of binary search tree. Comparing with Figure 2.1, we
can see the differences in keys ordering. To highlight the elements in binary search tree
is comparable, we call it as key, and name the augmented satellite data as value.
2.3 Insertion
When insert a key k (or along with a value) to binary search tree T , we need ensure the
key ordering property is always hold:
3 8
1 7 16
2 10
9 14
There is an exceptional case that k is equal to the key of root. It means k already
exists in the tree. We can overwrite it, or append data, or do nothing. We’ll skip such case
handling. This algorithm is simple and straightforward. We can define it as a recursive
function:
insert(∅, k) = N ode(∅, k, ∅)
{
k < k′ : N ode(insert(Tl , k), k ′ , Tr ) (2.1)
insert(N ode(Tl , k, Tr ), k) =
otherwise : N ode(Tl , k ′ , insert(Tr , k))
For the none empty node, Tl denotes the left sub-tree, Tr denotes the right sub-tree,
and k ′ is the key. The function N ode(l, k, r) creates a node from two sub-trees and a key.
∅ means empty (also known as NIL. This symbol was invented by mathematician André
Weil for null set. It came from the Norwegian alphabet). Below is the corresponding
example program in Haskell for insertion.
insert Empty k = Node Empty k Empty
insert (Node l x r) k | k < x = Node (insert l k) x r
| otherwise = Node l x (insert r k)
This example program utilizes the pattern matching features. The appendix of this
chapter provides another example without using this feature. Insertion can also be im-
plemented without recursion. Here is a pure iterative algorithm:
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: parent ← NIL
5: while T 6= NIL do
6: parent ← T
7: if k < Key(T ) then
8: T ← Left(T )
9: else
10: T ← Right(T )
11: Parent(x) ← parent
12: if parent = NIL then ▷ tree T is empty
13: return x
14: else if k < Key(parent) then
15: Left(parent) ← x
16: else
17: Right(parent) ← x
18: return root
2.4 Traverse
Traverse is to visit every element one by one. There are 3 different ways to walk through
a binary tree: (1) pre-order tree walk, (2) in-order tree walk, (3) and post-order tree walk.
They are named to highlight the order of visiting key before/after sub-trees.
Each ‘visit’ operation is recursive, for example in pre-order traverse, when visit the
left sub-tree, we recursively traverse it if it is not empty. For the tree shown in figure 2.2,
the corresponding visiting orders are as below:
It is not by accident that the in-order traverse lists the elements one by one increas-
ingly. The definition of the binary search tree ensures it is always true. We leave the
proof as an exercise. Specifically, the in-order traverse algorithm is defined as:
We can further define a generic map to apply any given function f to every element
in the tree along the in-order traverse. The result is a new tree mapped by f .
map(f, ∅) = ∅
(2.2)
map(f, N ode(Tl , k, Tr )) = N ode(map(f, Tl ), f (k), map(f, Tr ))
If we only need manipulate keys but not to transform the tree, we can implement this
algorithm imperatively.
1: function In-Order-Traverse(T, f )
2: if T 6= NIL then
3: In-Order-Traverse(Left(T, f ))
4: f (Key(T ))
5: In-Order-Traverse(Right(T, f ))
Leverage in-order traverse, we can change the map function to convert a binary search
tree to a sorted list. Instead building the tree in recursive case, we concatenate the result
to a list:
toList(∅) = [ ]
(2.3)
toList(N ode(Tl , k, Tr )) = toList(Tl ) +
+ [k] +
+ toList(Tr )
We can develop a method to sort a list of elements: first build a binary search tree
from the list, then turn it back to list through in-order traversing. This method is called
as ‘tree sort’. For a given list X = [x1 , x2 , x3 , ..., xn ].
Where function f romList repeatedly inserts elements from a list to a tree. It can be
defined to recursively process the list.
f romList([ ]) = ∅
f romList(X) = insert(f romList(X ′ ), x1 )
When the list is empty, the result is an empty tree; otherwise, it inserts the first
element x1 to the tree, then recursively inserts the rests X ′ = [x2 , x3 , ..., xn ]. By using
list folding[7] (see appendix A.6), we can also define f romList as the following:
We can also rewrite it in Curried form[9] (also known as partial application) such as
to omit parameter X:
Exercise 2.1
1. Given the in-order and pre-order traverse results, re-construct the tree, and output
the post-order traverse result. For example:
• Pre-order: 1, 2, 4, 3, 5, 6;
• In-order: 4, 2, 1, 5, 3, 6;
• Post-order: ?
2. Write a program to re-construct the binary tree from the pre-order and in-order
traverse lists.
3. For binary search tree, prove that the in-order traverse always visits elements in
increase order
4. Consider the performance of tree sort algorithm, what is its complexity for n
elements?
2.5 Query
Because the elements stored in binary search tree is well ordered and organized recursively.
It supports varies of search effectively. This is one of the reasons people name it as binary
search tree. There are mainly three types of querying: (1) look up a key; (2) find the
minimum or maximum element; (3) given any node, find its predecessor or successor.
2.5.1 Look up
Because binary search tree is recursive and all elements satisfy the ordering property, we
can look up a key k top-down from the root as the following:
• Compare k with the key of root, if equal, we are done. The key is stored in the
root;
2.5. QUERY 61
• If k is less than the key of root, then recursively look up the left sub-tree;
• Otherwise, look up the right sub-tree.
We can define the recursive lookup function for this algorithm as below.
lookup(∅, x) = ∅
k = x : T
(2.6)
lookup(N ode(Tl , k, Tr ), x) = x<k: lookup(Tl , x)
otherwise : lookup(Tr , x)
This function returns the tree node being located or empty if not found. One may
instead return the value that bound to the key. However, in search implementation, we
need consider using M aybe type (also known as Optional<T>) to handle the not found
case, for example:
lookup Empty _ = Nothing
lookup t@(Node l k r) x | k == x = Just k
| x < k = lookup l x
| otherwise = lookup r x
If the binary search tree is well balanced, which means almost all branch nodes have
both none empty sub-trees except for leaves. This is not the formal definition of balance.
We’ll define it in chapter 4. For a balanced tree of n elements, the algorithm takes O(lg n)
time to look up a key. If the tree is poor balanced, the worst case is bound to O(n) time.
If denote the height of the tree as h, we can represent the performance of look up as O(h).
We can also implement looking up purely iterative without recursion:
1: function Search(T, x)
2: while T 6= NIL and Key(T ) 6= x do
3: if x < Key(T ) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: return T
Such use cases demand us to design algorithm to find the successor or predecessor of
any node. The successor of element x is defined as the smallest element y that satisfies
x < y. If the node of x has none empty right sub-tree, then minimum element of the right
sub-tree is the successor. As shown in figure 2.4, to find the successor of 8, we search the
minimum element in its right sub-tree, which is 9. If the right sub-tree of node x is empty,
we need back-track along the parent field till the closest ancestor whose left sub-tree is
also an ancestor of x. In figure 2.4, since node 2 does not have right sub-tree, we go up to
its parent of node 1. However, node 1 does not have left sub-tree, we need go up again,
hence reach to node 3. As the left sub-tree of node 3 is also an ancestor of node 2, node
3 is the successor of node 2.
3 8
1 7 16
2 10
9 14
Figure 2.4: The successor of 8, is the minimum one in its right sub-tree, 9; In order to
find the successor of 2, we go up to its parent 1, then 3.
If we finally reach to the root when back-track along the parent, but still can not find
an ancestor on the right, then the node does not have a successor. Below algorithm finds
the successor of a given node x:
1: function Succ(x)
2: if Right(x) 6= NIL then
3: return Min(Right(x))
4: else
5: p ← Parent(x)
6: while p 6= NIL and x = Right(p) do
7: x←p
8: p ← Parent(p)
9: return p
This algorithm returns NIL when x does not has successor. The predecessor finding
algorithm is symmetric:
1: function Pred(x)
2: if Left(x) 6= NIL then
2.6. DELETION 63
3: return Max(Left(x))
4: else
5: p ← Parent(x)
6: while p 6= NIL and x = Left(p) do
7: x←p
8: p ← Parent(p)
9: return p
It seems hard to find the purely functional solution, because there is no pointer like
field linking to the parent node2 . One solution is to left ‘breadcrumbs’ when we visit the
tree, and use these information to back-track or even re-construct the whole tree. Such
data structure, that contains both the tree and ‘breadcrumbs’ is called zipper[?].
Our original purpose to develop succ and pred functions is ‘to traverse all the elements’
as a generic container. However, in functional settings, we typically traverse the tree in-
order through map. We’ll meet similar situations in the rest of this book. A problems
valid in imperative settings may not be necessarily meaningful in functional settings. For
example, to delete an element from red-black tree[5].
Exercise 2.2
1. Use Pred and Succ to write an iterator to traverse the binary search tree as a
generic container. What’s the time complexity to traverse a tree of n elements?
2. One can traverse elements inside a range [a, b] for example:
for_each (m.lower_bound(12), m.upper_bound(26), f);
Write an equivalent functional program for binary search tree.
2.6 Deletion
We need special consideration when delete an element from the binary search tree. This
is because we must keep the ordering property, that for any node, all keys in left sub-tree
are less than the key of this node, and they are all less than any keys in right sub tree.
Blindly deleting a node may break this constraint.
To delete a node x from a binary search tree[6].
• Otherwise (x has two sub-trees), use the minimum element y of its right sub-tree
to replace x, and splice the original y out.
The simplicity comes from the fact that, for the node to be deleted, if the right sub-
tree is not empty, then the minimum element is some node in it. It can’t have two none
empty children, and end up in the trivial case. Therefore, the node can be directly splice
out from the tree.
Figure 2.5, 2.6, and 2.7 illustrate different cases for deletion.
Based on this idea, we can define the delete algorithm as below:
delete(∅, x) = ∅
x < k : N ode(delete(Tl , x), k, Tr )
(2.9)
delete(N ode(Tl , k, Tr ), x) = x > k : N ode(Tl , k, delete(Tr , x))
x = k : del(Tl , Tr )
2 There is ref in ML and OCaml, we limit to the purely functional settings.
64 CHAPTER 2. BINARY SEARCH TREE
Tree
NIL NIL
Tree
Tree
x
L
L NIL
Tree
Tree
x
R
NIL R
Figure 2.6: Delete a node with only one none empty sub-tree.
2.6. DELETION 65
Tree
min(R)
Tree
x
L delete(R, min(R))
L R
Function del performs slicing, and mutually call delete recursively to cut off the min-
imum from the right sub-tree.
del(∅, Tr ) = Tr
del(Tl , ∅) = Tl (2.10)
del(Tl , Tr ) = N ode(Tl , y, delete(Tr , y))
Where y = min(Tr ) is the minimum element in the right sub-tree. Here is the corre-
sponding example program:
delete Empty _ = Empty
delete (Node l k r) x | x < k = Node (delete l x) k r
| x > k = Node l k (delete r x)
| otherwise = del l r
where
del Empty r = r
del l Empty = l
del l r = let k' = min r in Node l k' (delete r k')
This algorithm firstly looks up the node to be deleted, then executes the deletion. It
takes O(h) time where h is the height of the tree.
The imperative deletion algorithm needs set the parent properly in addition. The
following one returns the root of the result tree.
1: function Delete(T, x)
2: r←T
3: x′ ← x ▷ save x
4: p ← Parent(x)
5: if Left(x) = NIL then
6: x ← Right(x)
7: else if Right(x) = NIL then
8: x ← Left(x)
9: else ▷ neither children is empty
10: y ← Min(Right(x))
66 CHAPTER 2. BINARY SEARCH TREE
Exercise 2.3
1. There is a symmetric deletion algorithm. When neither sub-tree is empty, we
can replace the key by splicing the maximum node off the left sub-tree. Write a
program to implement this solution.
Exercise 2.4
2.8. MAP 67
2.8 Map
We can use binary search tree to realize the map data structure (also known as associative
data structure or dictionary). A finite map is a collection of key-value pairs. The keys
are unique, that every key is mapped to a value. For keys of type K, values of type V ,
the map is M ap K V or Map<K, V>. For none empty map, it contains n mappings of
k1 7→ v1 , k2 7→ v2 , ..., kn 7→ vn . When use the binary search tree to implement map, we
constrain K to be ordered set. Every node stores both key and value. We use the tree
insert/update operation to bind a key to a value. Given a key k, we use the tree lookup
to find the mapped value, or returns nothing when k does not exist. The red-black tree
and AVL tree introduced in later chapters can also be used to implement map.
Node(Node<T> l, T k, Node<T> r) {
left = l, key = k, right = r
if (left ̸= null) then [Link] = this
if (right ̸= null) then [Link] = this
}
}
Insertion sort
3.1 Introduction
Insertion sort is a straightforward sort algorithm1 . We give its preliminary definition
for list in chapter 1. For a collection of comparable elements, we repeatedly pick one,
insert them to a list and maintain the ordering. As every insertion takes linear time, its
performance is bound to O(n2 ) where n is the number of elements. This performance is
not as good as the divide and conqueror sort algorithms, like quick sort and merge sort.
However, we can still find its application today. For example, a well tuned quick sort
implementation falls back to insertion sort for small data set. The idea of insertion sort is
similar to sort a deck of a poker cards([4] pp.15). The cards are shuffled. A player takes
card one by one. At any time, all cards on hand are sorted. When draws a new card, the
player inserts it in proper position according to the order of points as shown in figure 3.1.
69
70 CHAPTER 3. INSERTION SORT
2: for i ← 2 to |A| do
3: ordered insert A[i] to A[1...(i − 1)]
Where the index i ranges from 1 to n = |A|. We start from 2, because the singleton
sub-array [A[1]] is ordered. When process the i-th element, all elements before i are
sorted. We continuously insert elements till consuming all the unsorted ones, as shown in
figure 3.2.
insert
3.2 Insertion
In chapter 1, we give the ordered insertion algorithm for list. For array, we also scan it
to locate the insert position either from left or right. Below algorithm is from right:
1: function Sort(A)
2: for i ← 2 to |A| do ▷ Insert A[i] to A[1...(i − 1)]
3: x ← A[i] ▷ Save A[i] to x
4: j ←i−1
5: while j > 0 and x < A[j] do
6: A[j + 1] ← A[j]
7: j ←j−1
8: A[j + 1] ← x
It’s expensive to insert at arbitrary position, as array stores elements continuously.
When insert x at position i, we need shift all elements after i (i.e. A[i + 1], A[i + 2], ...)
one cell to right. After free up the cell at i, we put x in, as shown in figure 3.3.
insert
A[1] A[2] ... A[i-1] A[i] A[i+1] A[i+2] ... A[n-1] A[n] empty
For the array of length n, suppose after comparing x to the first i elements, we located
the position to insert. Then we shift the rest n − i + 1 elements, and put x in the i-th
cell. Overall, we need traverse the whole array if scan from left. On the other hand, if
scan from right to left, we examine n − i + 1 elements, and perform the same amount of
shifts. We can also define a separated Insert() function, and call it inside the loop. The
insertion takes linear time no matter scans from left or right, hence the sort algorithm is
bound to O(n2 ), where n is the number of elements.
3.3. BINARY SEARCH 71
Exercise 3.1
1. Implement the insert to scan from left to right.
2. Define the insert function, and call it from the sort algorithm.
Exercise 3.2
1. Implement the recursive binary search.
3.4 List
With binary search, the search time improved to O(n lg n). However, as we need shift
array cells when insert, the overall time is still bound to O(n2 ). On the other hand, when
use list, the insert operation is constant time at a given node reference. In chapter 1, we
72 CHAPTER 3. INSERTION SORT
Instead of using node reference, we can also realize list through an additional index
array. For every element A[i], N ext[i] stores the index to the next element follows A[i],
i.e. A[N ext[i]] is the next element of A[i]. There are two special indexes: for the tail node
A[m], we define N ext[m] = −1, indicating it points to NIL; we also define N ext[0] to
index the head element. With the index array, we can implement the insertion algorithm
as below:
1: function Insert(A, N ext, i)
2: j←0 ▷ N ext[0] for head
3: while N ext[j] 6= −1 and A[N ext[j]] < A[i] do
4: j ← N ext[j]
5: N ext[i] ← N ext[j]
6: N ext[j] ← i
7: function Sort(A)
8: n ← |A|
9: N ext = [1, 2, ..., n, −1] ▷ n + 1 indexes
10: for i ← 1 to n do
11: Insert(A, N ext, i)
12: return N ext
With list, although the insert operation changes to constant time, we need traverse
the list to locate the position. It is still bound to O(n2 ) times comparison. Unlike array,
list does not support random access, hence we can not use binary search to speed up.
Exercise 3.3
1. For the index array based list, we return the re-arranged index as result. Design
an algorithm to re-order the original array A from the index N ext.
1: function Sort(A)
2: T ←∅
3: for each x ∈ A do
4: T ← Insert-Tree(T, x)
5: return To-List(T )
Where Insert-Tree() and To-List() are defined in chapter 2. In average case, the
performance of tree sort is bound to O(n lg n), where n is the number of elements. This
is the lower limit of comparison based sort([?] pp.180-193). However, in the worst case,
if the tree is poor balanced the performance drops to O(n2 ).
3.6 Summary
Insertion sort is often used as the first example of sorting. It is straightforward and easy to
implement. However its performance is quadratic. Insertion sort does not only appear in
textbooks, it has practical use case in the quick sort implementation. It is an engineering
practice to fallback to insertion sort when the number of elements is small.
74 Red-black tree
Chapter 4
Red-black tree
4.1 Introduction
As the example in chapter 2, we use the binary search tree as a dictionary to count the
word occurrence in text. One may want to feed a address book to a binary search tree,
and use it to look up the contact as below example program:
Unlike the word counter program, this one performs poorly, especially when search
names like Zara, Zed, Zulu, etc. This is because the address entries are typically listed in
lexicographic order, i.e. the names are input in ascending order. If insert numbers 1, 2,
3, ..., n to a binary search tree, it ends up like in figure 4.1. It is an extremely unbalanced
binary search tree. The lookup() is bound to O(h) time for a tree with height h. When
the tree is well balanced, the performance is O(lg n), where n is the number of elements in
the tree. But in this extreme case, the performance downgrades to O(n). It is equivalent
to list scan.
Exercise 4.1
1. For a big address entry list in lexicographic order, one may want to speed up
building the address book with two concurrent tasks: one reads from the head;
while the other reads from the tail, till they meet at some middle point. What
does the binary search tree look like? What if split the list into multiple sections
to scale the concurrency?
2. Find more cases to exploit a binary search tree, for example in figure 4.2.
75
76 CHAPTER 4. RED-BLACK TREE
...
4.1.1 Balance
To avoid extremely unbalanced case, we can shuffle the input(12.4 in [4]), however, when
the input is entered by user interactively, we can not randomize the sequence. People de-
veloped solutions to make the tree balanced. They mostly rely on the rotation operation.
Rotation changes the tree structure while maintain the elements ordering. This chapter
introduces the red-black tree, the widely used self-adjusting balanced binary search tree.
Next chapter is about AVL tree, another self-balanced tree. Chapter 8 introduce the
splay tree. It adjusts the tree in steps to make it balanced.
and
The second row in each equation keeps the tree unchanged if the pattern does not
match (for example, both sub-trees are empty). We can also implement tree rotation
imperatively. We need re-assign sub-trees and parent node reference. When rotate, we
pass both the root T , and the node x as parameters:
1: function Left-Rotate(T, x)
2: p ← Parent(x)
3: y ← Right(x) ▷ assume y 6= NIL
4: a ← Left(x)
5: b ← Left(y)
6: c ← Right(y)
7: Replace(x, y) ▷ replace node x with y
4.1. INTRODUCTION 77
n
n
n-1 3
n-2 n-1
... 4
1 ...
(a) (b)
m
m-1 m+1
m-2 m+2
... ...
1 n
(c)
4: function Set-Right(x, y)
5: Right(x) ← y
6: if y 6= NIL then Parent(y) ← x
We can see how pattern matching simplifies the tree rotation. Based on this idea,
Okasaki developed the purely functional algorithm for red-black tree in 1995[13].
Exercise 4.2
1. Implement the Right-Rotate.
4.2 Definition
A red-black tree is a self-balancing binary search tree[14]. It is essentially equivalent to
2-3-4 tree1 . By coloring the node red or black, and performing rotation, red-black tree
provides an efficient way to keep the tree balanced. On top of the binary search tree
definition, we label the node with a color. We say it is a red-black tree if the coloring
satisfies the following 5 rules([4] pp273):
Why do they keep the red-black tree balanced? The key point is that, the longest
path from the root to leaf can not be as 2 times longer than the shortest path. Consider
rule 4, there can not be any two adjacent red nodes. Therefore, the shortest path only
contains black nodes. Any longer path must have red ones. In addition, rule 5 ensures all
paths have the same number of black nodes. So as to the root. It eventually ensures any
path is not 2 times longer than others[14]. Figure 4.4 gives an example of red-black tree.
13
8 17
1 11 15 25
As all NIL nodes are black, we can hide them as shown in figure 4.5. All operations
including lookup, min/max, are same as the binary search tree. However, the insert and
delete are special, as we need maintain the coloring rules.
13
8 17
1 11 15 25
6 22 27
Below example program adds the color field atop binary search tree definition:
data Color = R | B
data RBTree a = Empty
80 CHAPTER 4. RED-BLACK TREE
Exercise 4.3
1. Prove the height h of a red-black tree of n nodes is at most 2 lg(n + 1)
4.3 Insert
The insert algorithm for red-black tree has two steps. The first step is as same as the
binary search tree. The tree may become unbalanced after that, we need fix it to resume
the red-black tree coloring in the second step. When insert a new element, we always
make it red. Unless the new node is the root, we won’t break any coloring rules except for
the 4-th. Because it may bring two adjacent red nodes. Okasaki finds there are 4 cases
violate rule 4. All have two adjacent red nodes. They share a uniformed structure after
fixing[13] as shown in figure 4.6.
All 4 transformations move the redness one level up. When perform bottom-up re-
cursive fixing, it may color the root red. While rule 2 requires the root always be black.
We need revert the root back to black finally. With pattern matching, we can define a
balance function to fix the tree. Denote the color as C with values black B, and red R.
A none empty node is in the form of T = (C, l, k, r), where l, r are the left and right
sub-trees, k is the key.
The last row says if the tree is not in any 4 patterns, then we leave it unchanged. We
define the insert algorithm for red-black tree as below:
where
ins ∅ k (R, ∅, k, ∅)
= {
k < k ′ : balance C (ins l k) k ′ r (4.5)
ins (C, l, k ′ , r) k =
k > k ′ : balance C l k ′ (ins r k)
If the tree is empty, we create a red leaf of k; otherwise, let the sub-trees and the key
be l, r, k ′ , we compare k and k ′ , then recursively insert k to a sub-tree. After that, we
call balance to fix the coloring, then force the root to be black finally.
We skip to handle the duplicated keys. If the key already exists, we can overwrite,
drop, or store the values in a list ([4], pp269). Figure 4.7 shows two red-black trees built
from sequence 11, 2, 14, 1, 7, 15, 5, 8, 4 and 1, 2, ..., 8. The second example demonstrates
the tree is well balanced even for ordered input.
7 4
2 14 2 6
1 5 11 15 1 3 5 7
4 8 8
The algorithm performs top-down recursive insertion and fixing. It is bound to O(h)
time, where h is the height of the tree. As the red-black tree coloring rules are maintained,
h is the logarithm to the number of nodes n. The overall performance is O(lg n).
Exercise 4.4
1. Implement the insert algorithm without using pattern matching, but test the 4
cases separately.
82 CHAPTER 4. RED-BLACK TREE
4.4 Delete
Delete is more complex than insert. We can also use pattern matching and recursion to
simplify the delete algorithm for red-black tree2 . There are alternatives to mimic delete.
Sometimes, we build the read-only tree, then use it for frequently looking up[5]. When
delete, we mark the deleted node with a flag, and later rebuild the tree if such nodes
exceeds 50%. Delete may also violate the red-black tree coloring rules. We use the same
idea to apply fixing after delete. The coloring violation only happens when delete a black
node according to rule 5. The black nodes along the path decreases by one, hence not all
paths contain the same number of black nodes.
To resume the blackness, we introduce a special ‘doubly-black’ node([4], pp290). One
such node is counted as 2 black nodes. When delete a black node x, we can move the
blackness either up to its parent or down to one sub-tree. Let this node be y that accepts
the blackness. If y was red, we turn it black; if y was already black, we make it ‘doubly-
black’, denoted as B 2 . Below example program adds the ‘doubly-black’ support:
data Color = R | B | BB
data RBTree a = Empty | BBEmpty
| Node Color (RBTree a) a (RBTree a)
Because all empty leaves are black, when push the blackness down to a leaf, it becomes
‘doubly-black’ empty (BBEmpty, or bold ∅ ). The first step is to perform the normal
binary search tree delete; then if the cut off node is black, we shift the blackness, and fix
the tree coloring.
This definition is in Curried form. When delete the only element, the tree becomes
empty. To cover this case, we modify makeBlack as below:
makeBlack ∅ = ∅
(4.8)
makeBlack (C, l, k, r) = (B, l, k, r)
del ∅ k = ∅
′ ′
2
k < k : f ixB (C, (del l k), k , r)
k > k ′ : f ixB 2 (C, l, k ′ , (del r k))
l = ∅ : (C = B 7→ shif tB r, r)
(4.9)
del (C, l, k ′ , r) k =
r = ∅ : (C = B 7→ shif tB l, l)
k = k ′ :
f ixB 2 (C, l, k ′′ , (del r k ′′ ))
else :
where k ′′ = min(r)
When the tree is empty, the result is ∅; otherwise, we compare the key k ′ in the tree
with k. If k < k ′ , we recursively delete k from the left sub-tree; if k > k ′ then delete from
the right. Because the recursive result may contain doubly-black node, we need apply
f ixB 2 to fix it. When k = k ′ , we need splice it out. If either sub-tree is empty, we replace
it with the other, then shift the blackness if the spliced node is black. This is represented
with McCarthy form (p 7→ a, b), which is equivalent to ‘(if p then a else b)’. If neither
sub-tree is empty, we cut the minimum element k ′′ = min(r), and use k ′′ to replace k.
2 Actually, the tree is rebuilt in purely functional setting, although the common part is reused. This
To reserve the blackness, shif tB makes a black node doubly-black, and forces it black
for other cases. It flips doubly-black to normal black when applied twice.
shif tB (B, l, k, r) = (B 2 , l, k, r)
shif tB (C, l, k, r) = (B, l, k, r)
(4.10)
shif tB ∅ = ∅
shif tB ∅ = ∅
The f ixB 2 function eliminates the doubly-black node by rotation and re-coloring.
The doubly-black node can be branch node or empty ∅ . There are three cases:
Case 1. The sibling of the doubly-black node is black, and it has a red sub-tree. We
can fix this case with a rotation. There are 4 sub-cases, all can be transformed to a
uniformed pattern, as shown in figure A.1.
The fixing for these 4 sub-cases can be realized with pattern matching.
f ixB 2 C aB2 x (B, (R, b, y, c), z, d) = (C, (B, shif tB(a), x, b), y, (B, c, z, d))
f ixB 2 C aB2 x (B, b, y, (R, c, z, d)) = (C, (B, shif tB(a), x, b), y, (B, c, z, d))
f ixB 2 C (B, a, x, (R, b, y, c)) z dB2 = (C, (B, a, x, b), y, (B, c, z, shif tB(d)))
f ixB 2 C (B, (R, a, x, b), y, c) z dB2 = (C, (B, a, x, b), y, (B, c, z, shif tB(d)))
(4.11)
Where a means node a is doubly-black, it can be branch or .
B2 ∅
Case 2. The sibling of the doubly-black is red. We can rotate the tree to turn it into
case 1 or 3, as shown in figure A.2.
...
f ixB 2 B aB2 x (R, b, y, c) = f ixB 2 B (f ixB 2 R a x b) y c (4.12)
f ixB 2 B (R, a, x, b) y cB2 = f ixB 2 B a x (f ixB 2 R b y c)
Case 3. The sibling of the doubly-black node, and its two sub-trees are all black. In
this case, we change the sibling to red, flip the doubly-black node to black, and propagate
the doubly-blackness a level up to parent as shown in figure A.3.
There are two symmetric sub-cases. For the upper case, x was either red or black. x
changes to black if it was red, otherwise changes to doubly-black; Same coloring changes
to y in the lower case. We add this fixing to equation (4.12):
...
f ixB 2 C aB2 x (B, b, y, c) = shif tB (C, (shif tB a), x, (R, b, y, c))
(4.13)
f ixB 2 C (B, a, x, b) y cB2 = f ixB 2 B a x (f ixB 2 R b y c)
f ixB 2 C l k r = (C, l, k, r)
If none of the patterns match, the last row keeps the node unchanged. The doubly-
black fixing is recursive. It terminates in two ways: One is Case 1, the doubly-black
node is eliminated. Otherwise the blackness may move up till the root. Finally the we
force the root be black. Below example program puts all three cases together:
−− the sibling is black, and has a red sub-tree
fixDB color a@(Node BB _ _ _) x (Node B (Node R b y c) z d)
= Node color (Node B (shiftBlack a) x b) y (Node B c z d)
4.4. DELETE 85
The delete algorithm is bound to O(h) time, where h is the height of the tree. As
red-black tree maintains the balance, h = O(lg n) for n nodes.
Exercise 4.5
1. Implement the alternative delete algorithm: mark the node as deleted without
actually removing it. When the marked nodes exceed 50%, re-build the tree.
86 CHAPTER 4. RED-BLACK TREE
11
2 14
1 7 15
5 8
2 7
1 4 6 9
3 8
4.6 Summary
Self setLeft(l) {
left = l
if l ̸= null then [Link] = this
}
Self setRight(r) {
right = r
if r ̸= null then [Link] = this
}
[Link] = [Link]
[Link]().color = [Link]
[Link]().color = [Link]
x = [Link]()
} else {
if ([Link] == [Link]().left) {
if (x == [Link]) {
// case 2: ((a x:R b:R) y:B c) =⇒ case 3
x = [Link]
t = leftRotate(t, x)
}
// case 3: ((a:R x:R b) y:B c) =⇒ (a:R x:B (b y:R c))
[Link] = [Link]
[Link]().color = [Link]
t = rightRotate(t, [Link]())
} else {
if (x == [Link]) {
// case 2': (a x:B (b:R y:R c)) =⇒ case 3'
x = [Link]
t = rightRotate(t, x)
}
// case 3': (a x:B (b y:R c:R)) =⇒ ((a x:R b) y:B c:R)
[Link] = [Link]
[Link]().color = [Link]
t = leftRotate(t, [Link]())
}
}
}
[Link] = [Link]
return t
}
90 AVL tree
Chapter 5
AVL tree
5.1 Introduction
The idea of red-black tree is to limit the number nodes along a path within a range. AVL
tree takes a direct approach: quantify the difference between branches. For a node T ,
define:
Where |T | is the height of tree T , l and r are the left and right sub-trees. Define
δ(∅) = 0 for the empty tree. If δ(T ) = 0 for every node T , the tree is definitely balanced.
For example, a complete binary tree has n = 2h − 1 nodes for height h. There are not
any empty branches unless the leaves. The less absolute value of δ(T ), the more balanced
between the sub-trees. We call δ(T ) the balance factor of a binary tree.
5.2 Definition
2 8
1 3 6 9
5 7 10
|δ(T )| ≤ 1 (5.2)
There are three valid values for δ(T ): ±1, and 0. Figure 5.1 shows an AVL tree. This
definition ensures the tree height h = O(lg n), where n is the number of nodes in the tree.
91
92 CHAPTER 5. AVL TREE
Let’s prove it. For an AVL tree of height h, the number of nodes varies. There are at
most 2h − 1 nodes for a complete binary tree case. We are interesting in how many nodes
at least. Let the minimum number be N (h). We have the following result:
• Empty tree ∅: h = 0, N (0) = 0;
• Singleton tree: h = 1, N (1) = 1;
Figure 5.2 shows an AVL tree T of height h. It contains three parts, the key k, and
two sub-trees l, r. We have the following equation:
h-1 h-2
Figure 5.2: An AVL tree of height h. The height of one sub-tree is h − 1, the other is no
less than h − 2.
For AVL tree, lookup, max, min are as same as the binary search tree. We focus on
insert and delete algorithms.
5.3 Insert
When insert a new element, |δ(T )| may exceed 1. We can use pattern matching similar to
red-black tree to develop a simplified solution. After insert element x, for those sub-trees
which are the ancestors of x, the height may increase at most by 1. We need recursively
update the balance factor along the path of insertion. Define the insert result as a pair
(T ′ , ∆H), where T ′ is the updated tree and ∆H is the increment of height. We modify
the binary search tree insert function as below:
insert = fst ◦ ins (5.8)
Where fst (a, b) = a returns the first element in a pair. ins(T, k) does the actual work
to insert element k into tree T :
ins ∅ k = ((∅, { k, ∅, 0), 1)
k < k ′ : tree (ins l k) k ′ (r, 0) δ (5.9)
ins (l, k ′ , r, δ) k =
k > k ′ : tree (l, 0) k ′ (ins r, k) δ
If the tree is empty ∅, the result is a leaf of k with balance factor 0. The height
increases to 1. Otherwise let T = (l, k ′ , r, δ). We compare the new element k with k ′ .
If k < k ′ , we recursively insert k it to the left sub-tree l, otherwise insert to r. As the
recursive insert result is a pair of (l′ , ∆l) or (r′ , ∆r), we need adjust the balance factor
and update tree height through function tree, it takes 4 parameters: (l′ , ∆l), k ′ , (r′ , ∆r),
and δ. The result is (T ′ , ∆H), where T ′ is the new tree, and ∆H is defined as:
∆H = |T ′ | − |T | (5.10)
We can further break it down into 4 cases:
∆H = |T ′ | − |T |
= 1 + max(|r′ |, |l′ |) − (1 + max(|r|, |l|))
= max(|r′ |, |l′ |) − max(|r|, |l|)
δ ≥ 0, δ ′ ≥ 0 : ∆r (5.11)
δ ≤ 0, δ ′ ≥ 0 : δ + ∆r
=
δ ≥ 0, δ ′ ≤ 0 : ∆l − δ
otherwise : ∆l
1 Alternatively, we can record the height instead of δ[20].
94 CHAPTER 5. AVL TREE
Where δ ′ = δ(T ′ ) = |r′ | − |l′ |, is the updated balance factor. Appendix B provides the
proof for it. We need determine δ ′ before balance adjustment.
δ′ = |r′ | − |l′ |
= |r| + ∆r − (|l| + ∆l)
(5.12)
= |r| − |l| + ∆r − ∆l
= δ + ∆r − ∆l
With the changes in height and balance factor, we can define the tree function in
(5.9):
tree (l, dl) k (r, dr) d = balance (Br l k r d') deltaH where
d' = d + dr - dl
deltaH | d ≥ 0 && d' ≥ 0 = dr
| d ≤ 0 && d' ≥ 0 = d+dr
| d ≥ 0 && d' ≤ 0 = dl - d
| otherwise = dl
5.3.1 Balance
There are 4 cases need fix as shown in figure 5.3. The balance factor is ±2, exceeds the
range of [−1, 1]. We adjust them to a uniformed structure in the center, with the δ(y) = 0.
We call the 4 cases: left-left, right-right, right-left, and left-right. Denote the balance
factors before fixing as δ(x), δ(y), and δ(z); after fixing, they change to δ ′ (x), δ ′ (y) = 0,
5.3. INSERT 95
and δ ′ (z) respectively. The values of δ ′ (x) and δ ′ (z) can be given as below. Appendix B
gives the proof.
Left-left:
δ ′ (x) = δ(x)
δ ′ (y) = 0 (5.14)
δ ′ (z) = 0
Right-right:
δ ′ (x) = 0
δ ′ (y) = 0 (5.15)
δ ′ (z) = δ(z)
The performance of insert is proportion to the height of the tree. From (5.7), it is
bound to is O(lg n) where n is the number of elements in the tree.
Verification
To test an AVL tree, we need verify two things: It is a binary search tree; and for every
sub-tree T , equation (5.2): δ(T ) ≤ 1 holds. Below function examines the height difference
between the two sub-trees recursively:
avl? ∅ = True
(5.18)
avl? T = avl? l ∧ avl? r ∧ ||r| − |l|| ≤ 1
96 CHAPTER 5. AVL TREE
Where l, r are the left and right sub-trees. The height is calculated recursively:
|∅| = 0
(5.19)
|T | = 1 + max(|r|, |l|)
height Empty = 0
height (Br l _ r _) = 1 + max (height l) (height r)
Exercise 5.1
1. We only give the algorithm to test AVL height. Complete the program to test if a
binary tree is AVL tree.
• |δ| = 1, |δ ′ | = 0. The new node makes the tree well balanced. The height of the
parent keeps unchanged.
• |δ| = 0, |δ ′ | = 1. Either the left or the right sub-tree increases its height. We need
go on checking the upper level.
• |δ| = 1, |δ ′ | = 2. We need rotate the tree to fix the balance factor.
5.4. IMPERATIVE AVL TREE ALGORITHM ⋆ 97
1: function AVL-Insert-Fix(T, x)
2: while Parent(x) 6= NIL do
3: P ← Parent(x)
4: L ← Left(x)
5: R ← Right(x)
6: δ ← δ(P )
7: if x = Left(P ) then
8: δ′ ← δ − 1
9: else
10: δ′ ← δ + 1
11: δ(P ) ← δ ′
12: if |δ| = 1 and |δ ′ | = 0 then ▷ Height unchanged
13: return T
14: else if |δ| = 0 and |δ ′ | = 1 then ▷ Go on bottom-up update
15: x←P
16: else if |δ| = 1 and |δ ′ | = 2 then
17: if δ ′ = 2 then
18: if δ(R) = 1 then ▷ Right-right
19: δ(P ) ← 0 ▷ By (B.6)
20: δ(R) ← 0
21: T ← Left-Rotate(T, P )
22: if δ(R) = −1 then ▷ Right-left
23: δy ← δ(Left(R)) ▷ By (B.17)
24: if δy = 1 then
25: δ(P ) ← −1
26: else
27: δ(P ) ← 0
28: δ(Left(R)) ← 0
29: if δy = −1 then
30: δ(R) ← 1
31: else
32: δ(R) ← 0
33: T ← Right-Rotate(T, R)
34: T ← Left-Rotate(T, P )
′
35: if δ = −2 then
36: if δ(L) = −1 then ▷ Left-left
37: δ(P ) ← 0
38: δ(L) ← 0
39: Right-Rotate(T, P )
40: else ▷ Left-Right
41: δy ← δ(Right(L))
42: if δy = 1 then
43: δ(L) ← −1
44: else
45: δ(L) ← 0
46: δ(Right(L)) ← 0
47: if δy = −1 then
48: δ(P ) ← 1
49: else
50: δ(P ) ← 0
51: Left-Rotate(T, L)
98 CHAPTER 5. AVL TREE
52: Right-Rotate(T, P )
53: break
54: return T
Besides rotation, we also need update δ for the impacted nodes. The right-right and
left-left cases need one rotation, while the right-left and left-right case need two rotations.
We skip the AVL tree delete algorithm in this chapter. Appendix B provides the delete
implementation.
5.5 Summary
AVL tree was developed in 1962 by Adelson-Velskii and Landis[18], [19]. It is named after
the two authors. AVL tree was developed earlier than the red-black tree. Both are self-
balance binary search trees. Most tree operations are bound O(lg n) time. From (5.7),
AVL tree is more rigidly balanced, and performs faster than red-black tree in looking up
intensive applications [18]. However, red-black tree performs better in frequently insertion
and removal cases. Many popular self-balance binary search tree libraries are implemented
on top of red-black tree. AVL tree also provides the intuitive and effective solution to the
balance problem.
} else if d2 == -2 {
if [Link] == -1 { //Left-left
[Link] = 0
[Link] = 0
t = rotateRight(t, p)
} else if [Link] == 1 { //Left-right
var dy = [Link]
[Link] = if dy == 1 then -1 else 0
[Link] = 0
[Link] = if dy == -1 then 1 else 0
t = rotateLeft(t, l)
t = rotateRight(t, p)
}
}
break
}
}
return t
}
100 Radix tree
Chapter 6
Radix tree
Binary search tree stores data in nodes. Can we use the edges to carry information? Radix
trees, including trie, prefix tree, and suffix tree are the data structures developed based
on this idea in 1960s. They are widely used in compiler design[21], and bio-information
processing, like DNA pattern matching [23].
0 1
1 0
10
1 0 1
011 100
1011
Figure 6.1 shows a Radix tree. It contains bits 1011, 10, 011, 100 and 0. When lookup
a key k = (b0 b1 ...bn )2 , we take the first bit b0 (MSB from left), check whether it is 0 or 1.
For 0, turn left, else turn right. Then take the second bit and repeat looking up till either
reach a leaf node or consume all the n bits. We needn’t store keys in Radix tree node.
The information is represented by edges. The nodes labelled with key in figure 6.1 are
for illustration purpose. If the keys are integers, we can represent them in binary format,
and implement lookup with bit-wise manipulations.
101
102 CHAPTER 6. RADIX TREE
binary tree in which the placement of each key is controlled by its bits, each 0 means ‘go
left’ and each 1 means ‘go right’[21]. Consider the binary trie in figure 6.2. The three
keys are different bit strings of “11”, “011”, and “0011” although they are all equal to 3.
0 1
0 1 1
11
1 1
011
0011
It is inefficient to treat the prefix zeros as valid bits. For 32 bits integers, we need
a tree of 32 levels to insert number 1. Okasaki suggested to use little-endian integers
instead[21]. 1 is represented as bits (1)2 , 2 as (01)2 , and 3 is (11)2 , ...
6.1.1 Definition
We can re-use binary tree structure to define the little-endian binary trie. A node is either
empty, or a branch containing the left, right sub-trees, and an optional value. The left
sub-tree is encoded as 0 and the right sub-tree is encoded as 1.
data IntTrie a = Empty
| Branch (IntTrie a) (Maybe a) (IntTrie a)
Given a node in the binary trie, the integer key bound to it is uniquely determined
through its position. That is the reason we need not save the key, but only the value in
the node. The type of the key is always integer, we call the tree IntT rie A if the value is
of type A.
6.1.2 Insert
When insert an integer key k and a value v, we convert k into binary form. If k is even,
the lowest bit is 0, we recursively insert to the left sub-tree; otherwise if k is odd, the
lowest bit is 1, we recursively insert to the right. We next divide k by 2 to remove the
lowest bit. For none empty trie T = (l, v ′ , r), where l, r are the left and right sub-trees,
and v ′ is the optional value, function insert can be defined as below:
0 1
1:a
0 0
1 0 1
4:b 5:c
9:d
We can define the even/odd testing by modular 2, and check if the remainder is 0
or not: even(k) = (k mod 2 = 0). Or use bit-wise operation in some environment, like
(k & 0x1) == 0. We can eliminate the recursion through loops to realize an iterative
implementation as below:
1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node ▷ (NIL, Nothing, NIL)
4: p←T
5: while k 6= 0 do
6: if Even?(k) then
7: if Left(p) = NIL then
8: Left(p) ← Empty-Node
9: p ← Left(p)
10: else
11: if Right(p) = NIL then
12: Right(p) ← Empty-Node
13: p ← Right(p)
14: k ← bk/2c
15: Value(p) ← v
16: return T
104 CHAPTER 6. RADIX TREE
Insert takes, a trie T , a key k, and a value v. For integer k with m bits in binary, it
goes into m levels of the trie. The performance is bound to O(m).
6.1.3 Look up
When look up key k in a none empty integer trie, if k = 0, then the root node is the
target. Otherwise, we check the lowest bit, then recursively look up the left or right
sub-tree accordingly.
lookup ∅ k = Nothing
lookup (l, v, r) 0 = v
even(k) : k (6.2)
lookup l
lookup (l, v, r) k = 2
odd(k) :
k
lookup r b c
2
Below example program implements the lookup function:
lookup Empty _ = Nothing
lookup (Branch _ v _) 0 = v
lookup (Branch l _ r) k | even k = lookup l (k `div` 2)
| otherwise = lookup r (k `div` 2)
We can eliminate the recursion to implement the iterative lookup as the following:
1: function Lookup(T, k)
2: while k 6= 0 and T 6=NIL do
3: if Even?(k) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: k ← bk/2c
8: if T 6= NIL then
9: return Value(T )
10: else
11: return NIL
The lookup function is bound to O(m) time, where m is the number of bits of k.
Exercise 6.1
1. Can we change the definition from Branch (IntTrie a) (Maybe a) (IntTrie
a) to Branch (IntTrie a) a (IntTrie a), and return Nothing if the value
does not exist, and Just v otherwise?
001 1
4:b 1:a
01 1
9:d 5:c
Figure 6.4: Little endian integer tree for the map {1 → a, 4 → b, 5 → c, 9 → d}.
The key to the branch node is the longest common prefix for its descendant trees. In
other words, the sibling sub-trees branch out at the bit where ends at their longest prefix.
As the result, integer tree eliminates the redundant spaces in trie.
6.2.1 Definition
Integer prefix tree is a special binary tree. It is either empty or a node of:
• Or a branch with the left and right sub-trees, that share the longest common
prefix bits for their keys. For the left sub-tree, the next bit is 0, for the right, it is
1.
Below example program defines the integer prefix tree. The branch node contains 4
components: The longest prefix, a mask integer indicating from which bit the sub-trees
branch out, the left and right sub-trees. The mask is m = 2n for some integer n ≥ 0. All
bits that are lower than n do not belong to the common prefix.
data IntTree a = Empty
| Leaf Int a
| Branch Int Int (IntTree a) (IntTree a)
6.2.2 Insert
When insert integer y to tree T , if T is empty, we create a leaf of y; If T is a singleton
leaf of x, besides the new leaf of y, we need create a branch node, set x and y as the
two sub-trees. To determine whether y is on the left or right, we need find the longest
common prefix p of x and y. For example if x = 12 = (1100)2 , y = 15 = (1111)2 , then
p = (11oo)2 , where o denotes the bits we don’t care. We can use another integer m to
mask those bits. In this example, m = 4 = (100)2 . The next bit after p presents 21 . It is
0 in x, 1 in y. Hence, we set x as the left sub-tree and y as the right, as shown in figure
6.5.
If T is neither empty nor a leaf, we firstly check if y matches the longest common
prefix p in the root, then recursively insert it to the sub-tree according to the next bit
after p. For example, when insert y = 14 = (1110)2 to the tree shown in figure 6.5, since
p = (11oo)2 and the next bit (the bit of 21 ) is 1, we recursively insert y to the right
106 CHAPTER 6. RADIX TREE
prefix=1100
12
mask=100
0 1
12 15
sub-tree. If y does not match p in the root, we need branch a new leaf as shown in figure
6.6.
prefix=1100 prefix=1100
mask=100 mask=100
0 1 0 1
prefix=1110
12 15 12
mask=10
0 1
14 15
0 1 0 1
prefix=1110
12 15 5
mask=10
0 1
12 15
For integer key k and value v, let (k, v) be the leaf. For branch node, denote it as
(p, m, l, r), where p is the longest common prefix, m is the mask, l and r are the left and
6.2. INTEGER PREFIX TREE 107
insert ∅ k v =
(k, v)
insert (k, v ′ ) k v =
(k, v)
insert (k ′ , v ′ ) k v join k (k, v) k ′ (k ′ , v{
=
′
)
match(k, p, m) : zero(k, m) : (p, m, insert l k v)
insert (p, m, l, r) k v = otherwise : (p, m, insert r k v)
otherwise : join k (k, v) p (p, m, l, r)
(6.3)
The first clause creates a leaf when T = ∅; the second clause overrides the value for
the same key. Function match(k, p, m) tests if integer k and prefix p have the same bits
after masked with m through: mask(k, m) = p, where mask(k, m) = m − 1&k. It applies
bit-wise not to m − 1, then does bit-wise and with k. zero(k, m) tests the next bit in k
masked with m is 0 or not. We shift m one bit to right, then do bit-wise and with k:
Function join(p1 , T1 , p2 , T2 ) takes two different prefixes and trees. It extracts the
longest common prefix of p1 and p2 as (p, m) = LCP (p1 , p2 ), creates a new branch node,
then set T1 and T2 as the two sub-trees:
{
zero(p1, m) : (p, m, T1 , T2 )
join(p1 , T1 , p2 , T2 ) = (6.5)
otherwise : (p, m, T2 , T1 )
To calculate the longest common prefix, we can firstly compute bit-wise exclusive-or
for p1 and p2, then count the highest bit highest(xor(p1 , p2 )) as:
highest(0) = 0
highest(n) = 1 + highest(n >> 1)
Then generate a mask m = 2highest(xor(p1 ,p2 )) . The longest common prefix p can be
given by masking the bits with m for either p1 or p2 , like p = mask(p1 , m). The following
example program implements the insert function:
insert t k x
= case t of
Empty → Leaf k x
Leaf k' x' → if k == k' then Leaf k x
else join k (Leaf k x) k' t
Branch p m l r
| match k p m → if zero k m
then Branch p m (insert l k x) r
else Branch p m l (insert r k x)
| otherwise → join k (Leaf k x) p t
match k p m = (mask k m) == p
6: m ← 2m
7: return (MaskBit(a, m), m)
8: function MaskBit(x, m)
9: return x&m − 1
Figure 6.7 gives an example integer tree created from the insert algorithm. Although
integer prefix tree consolidates the chained nodes, the operation to extract the longest
common prefix need linear scan the bits. For integer of m bits, the insert is bound to
O(m).
prefix=0
mask=8
0 1
prefix=100
1:x
mask=2
0 1
4:y 5:z
6.2.3 Lookup
When lookup key k, if the integer tree T = ∅ or it is a leaf of T = (k ′ , v) with different
key, then k does not exist; if k = k ′ , then v is the result; if T = (p, m, l, r) is a branch
node, we need check if the common prefix p matches k under the mask m, then recursively
lookup the sub-tree l or r upon next bit. If fails to match the common prefix p, then k
does not exist.
lookup ∅ k = Nothing
{
k = k′ : Just v
lookup (k ′ , v) k =
otherwise : Nothing
{ (6.6)
match(k, p, m) : zero(k, m) : lookup l k
lookup (p, m, l, r) k = otherwise : lookup r k
otherwise : Nothing
We can also eliminate the recursion to implement the iterative lookup algorithm.
1: function Look-Up(T, k)
2: if T = NIL then
3: return NIL
4: while T is not leaf, and Match(k, Prefix(T ), Mask(T )) do
5: if Zero?(k, Mask(T )) then
6: T ← Left(T )
7: else
8: T ← Right(T )
110 CHAPTER 6. RADIX TREE
Exercise 6.2
1. Write a program to implement the lookup function.
2. Implement the pre-order traverse for both integer trie and integer tree. Only
output the keys when the nodes store values. What pattern does the result follow?
6.3 Trie
From integer trie and tree, we can extend the key to a list of elements. Particularly the
trie and tree with key in alphabetic string are powerful tools for text manipulation.
6.3.1 Definition
When extend the key type from 0/1 bits to generic list, the tree structure changes from
binary tree to multiple sub-trees. Taking English characters for example, there are up to
26 sub-trees when ignore the case as shown in figure 6.8.
Not all the 26 sub-trees contain data. In figure 6.8, there are only three none empty
sub-trees bound to ‘a’, ‘b’, and ‘z’. Other sub-trees, such as for ‘c’, are empty. We can
hide them in the figure. When it is case sensitive, or extent the key from alphabetic string
to generic list, we can adopt collection types, like map to define trie.
A trie is either empty or a node of 2 kinds:
Let the type of value be V , we denote the trie as T rie K V . Below example program
defines trie.
data Trie k v = Trie { value :: Maybe v
, subTrees :: [(k, Trie k v)]}
6.3.2 Insert
When insert a pair of key and value to the trie, where the key is a list of elements.
Let the trie be T = (v, ts), where v is the value stored in the trie, and ts = {c1 7→
T1 , c2 7→ T2 , ..., cm 7→ Tm } contains mappings between elements and sub-trees. Element
ci is mapped to sub-tree Ti . We can either implement the mapping through associated
list: [(c1 , T1 ), (c2 , T2 ), ..., (cm , Tm )], or through self-balanced tree map (Chapter 4 or 5).
a b c z
a nil ...
n o o
an
o o y o
boy zoo
t l
bool
another
Figure 6.8: A trie of 26 branches, containing key ‘a’, ‘an’, ‘another’, ‘bool’, ‘boy’, and
‘zoo’.
112 CHAPTER 6. RADIX TREE
When the key is empty, we override the value; otherwise, we extract the first element
k, check if there is a map among the sub-trees for k, and recursively insert ks and v ′ :
′
ins ∅ = { 7→ insert (Nothing, ∅) ks v ]
[k
c=k: (k 7→ insert t ks v ′ ) : ts (6.8)
ins ((c 7→ t) : ts) =
otherwise : (c 7→ t) : (ins ts)
If there is no sub-tree in the node, we create a mapping from k to an empty trie node
t = (Nothing, ∅); otherwise, we located the sub-tree t mapped to k, then recursively insert
ks and v ′ to t. Below is the example program implement insert, it’s based on associated
list to manage sub-tree mappings.
insert (Trie _ ts) [] x = Trie (Just x) ts
insert (Trie v ts) (k:ks) x = Trie v (ins ts) where
ins [] = [(k, insert empty ks x)]
ins ((c, t) : ts) = if c == k then (k, insert t ks x) : ts
else (c, t) : (ins ts)
6.3.3 Look up
When look up a none empty key (k : ks) from trie T = (v, ts), starting from the first
element k, if there exists sub-tree T ′ mapped to k, we then recursively lookup ks in T ′ .
When the key is empty, then return the value as result:
7: T ← Sub-Trees(T )[c]
8: return Value(T )
The lookup algorithm is bound to O(mn), where n is the length of the key, and m is
the size of the element set.
Exercise 6.3
1. Use the self-balance binary tree, like red-black tree or AVL tree to implement a map
data structure, and manage the sub-trees with map. We call such implementation
M apT rie and M apT ree respectively. What are the performance of insert and
lookup for map based tree and trie?
6.4.1 Definition
A prefix tree node t contains two parts: an optional value v; zero or multiple sub prefix
trees, each ti is bound to a list si . The sub-trees and their mappings are denoted as
[si 7→ ti ]. These lists share the longest common prefix s bound to the node t. i.e. s is
the longest common prefix of s + + s2 , ... For any i 6= j, list si and sj don’t have
+ s1 , s +
none empty common prefix. Consolidate the chained nodes in figure 6.8, we obtain the
corresponding prefix tree in figure 6.9.
a bo zoo
a zoo
n ol y
an bool boy
other
another
Figure 6.9: A prefix tree with keys: ‘a’, ‘an’, ‘another’, ‘bool’, ‘boy’, ‘zoo’.
We denote prefix tree t = (v, ts). Particularly, (Nothing, ∅) is the empty node, and
(Just v, ∅) is a leaf node of value v.
6.4.2 Insert
When insert key s, if the prefix tree is empty, we create a leaf node of s as figure 6.10
(a); otherwise, if there exits common prefix between s and si , where si is bound to some
114 CHAPTER 6. RADIX TREE
sub-tree ti , we branch out a new leaf tj , extract the common prefix, and map it to a new
internal branch node t′ , then put ti and tj as two sub-trees of t′ . Figure 6.10 (b) shows
this case. There are two special cases: s is the prefix of si as shown in figure 6.10 (c) →
(e); or si is the prefix of s as shown in figure 6.10 (d) → (e).
Figure 6.10: (a) insert ‘boy’ to empty tree; (b) insert ‘bool’, branch a new node out; (c)
insert ‘another’ to (b); (d) insert ‘an’ to (b); (e) insert ‘an’ to (c), same result as insert
‘another’ to (d)
Below function inserts key s and value v to the prefix tree t = (v ′ , ts):
If the key s is empty, we overwrite the value to v; otherwise, we call ins to examine
the sub-trees and their prefixes.
If there is no sub-tree in the node, then we create a leaf of v as the single sub-tree, and
map s to it; otherwise, for each sub-tree mapping s′ 7→ t, we compare s′ with s. If they
have common prefix (tested by the match function), then we branch out new sub-tree.
We define two lists matching if they have common prefix:
match ∅ B = T rue
match A ∅ = T rue (6.12)
match (a : as) (b : bs) = a=b
To extract the longest common prefix of two lists A and B, we define a function
(C, A′ , B ′ ) = lcp A B, where C ++ A′ = A and C + + B ′ = B hold. If either A or B is
empty, or their first elements are different, then the common prefix C = ∅; otherwise, we
6.4. PREFIX TREE 115
recursively extract the longest common prefix from the rest lists, and preprend the head
element:
lcp ∅ B = (∅, ∅, B)
lcp A ∅ = (∅,
{ A, ∅)
(6.13)
a 6= b : (∅, a : as, b : bs)
lcp (a : as) (b : bs) =
otherwise : (a : cs, as′ , bs′ )
where (cs, as′ , bs′ ) = lcp as bs in the recursive case. Function branch A v B t takes
two keys A, B, a value v, and a tree t. It extracts the longest common prefix C from A
and B, maps it to a new branch node, and assign sub-trees:
branch A v
B t=
′ ′
(C, ∅, B ) : (C, (Just v, [B 7→ t]))
′ ′ (6.14)
lcp A B = (C, A , ∅) : (C, insert t A v)
(C, A′ , B ′ ) : (C, (Nothing, [A′ 7→ (Just v, ∅), B ′ 7→ t]))
If A is the prefix of B, then A is mapped to the node of v, and the remaining list is
re-mapped to t, which is the single sub-tree in the branch; if B is the prefix of A, then we
recursively insert the remaining list and the value to t; otherwise, we create a leaf node
of v put it together with t as the two sub-trees of the branch. The following example
program implements the insert algorithm:
insert (PrefixTree _ ts) [] v = PrefixTree (Just v) ts
insert (PrefixTree v' ts) k v = PrefixTree v' (ins ts) where
ins [] = [(k, leaf v)]
ins ((k', t) : ts) | match k k' = (branch k v k' t) : ts
| otherwise = (k', t) : ins ts
match [] _ = True
match _ [] = True
match (a:_) (b:_) = a == b
11: c ← LCP(k, si )
12: k1 ← k − c, k2 ← si − c
13: if c 6= NIL then
14: match ← TRUE
15: if k2 = NIL then ▷ si is prefix of k
16: p ← Ti , k ← k1
17: break
18: else ▷ Branch out a new leaf
19: Add(Sub-Trees(p), c 7→ Branch(k1 , Leaf(v), k2 , Ti ))
20: Delete(Sub-Trees(p), si 7→ Ti )
21: return T
22: if not match then ▷ Add a new leaf
23: Add(Sub-Trees(p), k 7→ Leaf(v))
24: break
25: return T
Function LCP extracts the longest common prefix from two lists.
1: function LCP(A, B)
2: i←1
3: while i ≤ |A| and i ≤ |B| and A[i] = B[i] do
4: i←i+1
5: return A[1...i − 1]
There is a special case in Branch(s1 , T1 , s2 , T2 ). If s1 is empty, the key to be insert
is some prefix. We set T2 as the sub-tree of T1 . Otherwise, we create a new branch node
and set T1 and T2 as the two sub-trees.
1: function Branch(s1 , T1 , s2 , T2 )
2: if s1 = NIL then
3: Add(Sub-Trees(T1 ), s2 7→ T2 )
4: return T1
5: T ← Empty-Node
6: Sub-Trees(T ) ← {s1 7→ T1 , s2 7→ T2 }
7: return T
Although the prefix tree improves the space efficiency of trie, it is still bound to O(mn),
where n is the length of the key, and m is the size of the element set.
6.4.3 Look up
When look up a key k, we start from the root. If k = ∅ is empty, then return the root
value as the result; otherwise, we examine the sub-tree mappings, locate the one si 7→ ti ,
such that si is some prefix of k, then recursively look up k − si in sub-tree ti . If there
does not exist si as the prefix of k, then there is no such key in the prefix tree.
The prefix testing is linear to the length of the list, the lookup algorithm is bound to
O(mn) time, where m is the size of the element set, and n is the length of the key. We
skip the imperative implementation, and leave it as the exercise.
Exercise 6.4
1. Eliminate the recursion to implement the prefix tree lookup purely with loops
we expand all sub-trees till reach to n candidates; otherwise, we locate the sub-tree from
the mapped key, and look up recursively. In the environment supports lazy evaluation,
we can expand all candidates, and take the first n on demand: take n (startsW ith s t),
where t is the prefix tree.
Given a prefix s, function startsW ith searches all candidates in the prefix tree starts
with s. If s is empty, it enumerates all sub-trees, and prepand (∅, x) for none empty
value x in the root. Function enum ts is defined as:
Where concatMap (also known as flatMap) is an important concept for list compu-
tation. Literally, it results like firstly map on each element, then concatenate the result
together. It’s typically realized with ’build-foldr’ fusion law to eliminate the intermediate
list overhead. (see chapter 5 in my book Isomorphism – mathematics of programming)
If the input prefix s is not empty, we examine the sub-tree mappings, for each list and
sub-tree pair (k, t), if either s is prefix of k or vice versa, we recursively expand t and
prepand k to each result key; otherwise, s does not match any sub-trees, hence the result
is empty. Below example program implements this algorithm.
startsWith [] (PrefixTree Nothing ts) = enum ts
startsWith [] (PrefixTree (Just v) ts) = ([], v) : enum ts
startsWith k (PrefixTree _ ts) =
case find (λ(s, t) → s `isPrefixOf` k | | k `isPrefixOf` s) ts of
Nothing → []
Just (s, t) → [(s + + a, b) |
(a, b) ← startsWith (drop (length s) k) t]
We can also realize the algorithm Starts-With(T, k, n) imperatively. From the root,
we loop on every sub-tree mapping ki 7→ Ti . If k is the prefix for any sub-tree Ti , we
expand all things in it up to n items; if ki is the prefix of k, we then drop that prefix,
update the key to k − ki , then search Ti for this new key.
1: function Starts-With(T, k, n)
2: if T = NIL then
3: return NIL
4: s ← NIL
6.5. APPLICATIONS OF TRIE AND PREFIX TREE 119
5: repeat
6: match ← FALSE
7: for ki 7→ Ti in Sub-Trees(T ) do
8: if k is prefix of ki then
9: return Expand(s + + ki , Ti , n)
10: if ki is prefix of k then
11: match ← TRUE
12: k ← k − ki ▷ drop the prefix
13: T ← Ti
14: s←s+ + ki
15: break
16: until not match
17: return NIL
Where function Expand(s, T, n) populates n results from T and prepand s to each
key. We implement it with ‘breadth first search’ method (see section 14.3):
1: function Expand(s, T, n)
2: R ← NIL
3: Q ← [(s, T )]
4: while |R| < n and Q 6= NIL do
5: (k, T ) ← Pop(Q)
6: v ← Value(T )
7: if v 6= NIL then
8: Insert(R, (k, v))
9: for ki 7→ Ti in Sub-Trees(T ) do
10: Push(Q, (k + + ki , Ti ))
1. Press key sequence ‘4’, ‘6’, ‘6’, ‘3’, the word ‘home’ appears as a candidate;
3. Press key ’*’ again for another candidate, word ‘gone’ appears;
4. ...
MT 9 [i] gives the corresponding characters for digit i. We can also define the reversed
mapping from a character back to digit.
MT−1
9 = concatMap ((d, s) 7→ [(c, d)|c ∈ s]) MT 9 (6.19)
digits(s) = [MT−1
9 [c]|c ∈ s] (6.20)
For any character does not belong [a..z], we map it to a special key '#' as fallback.
Below example program defines the above two mappings.
mapT9 = [Link] [('2', ” abc ”), ('3', ” d e f ”), ('4', ” ghi ”),
('5', ” j k l ”), ('6', ”mno”), ('7', ” pqrs ”),
('8', ” tuv ”), ('9', ”wxyz”)]
Suppose we already build the prefix tree (v, ts) from all words in a dictionary. We
need change the above auto completion algorithm to process digit string ds. For every
sub-tree mappings (s 7→ t) ∈ ts, we convert the prefix s to digits(s), check if it matches
to ds (either one is the prefix of the other). There can be multiple sub-trees match ds as:
f indT 9 t ∅ = [∅]
(6.21)
f indT 9 (v, ts) ds = concatMap f ind pfx
For each mapping (s, t) in pfx, function f ind recursively lookup the remaining digits
ds′ in t, where ds′ = drop |s| ds, then prepend s to every candidate. However, the length
may exceeds the number of digits, we need cut and only take n = |ds| characters:
The following example program implements the predictive input look up algorithm:
6.6. SUMMARY 121
findT9 _ [] = [[]]
findT9 (PrefixTree _ ts) k = concatMap find pfx where
find (s, t) = map (take (length k) ◦ (s++)) $ findT9 t (drop (length s) k)
pfx = [(s, t) | (s, t) ← ts, let ds = digits s in
ds `isPrefixOf` k | | k `isPrefixOf` ds]
To realize the predictive text input imperatively, we can perform breadth first search
with a queue Q of tuples (prefix, D, t). Every tuple records the possible prefix searched so
far; the remaining digits D to be searched; and the sub-tree t we are going to search. Q is
initialized with the empty prefix, the whole digits sequence, and the root. We repeatedly
pop the tuple from the queue, and examine the sub-tree mappings. for every mapping
(s 7→ T ′ ), we convert s to digits(s). If D is prefix of it, then we find a candidate. We
append s to prefix, and record it in the result. If digits(s) is prefix of D, we need further
search the sub-tree T ′ . We create a new tuple of (prefix + + s, D′ , T ′ ), where D′ is the
remaining digits to be searched. Then push this new tuple back to the queue.
1: function Look-Up-T9(T, D)
2: R ← NIL
3: if T = NIL or D = NIL then
4: return R
5: n ← |D|
6: Q ← {(NIL, D, T )}
7: while Q 6= NIL do
8: (prefix, D, T ) ← Pop(Q)
9: for (s 7→ T ′ ) ∈ Sub-Trees(T ) do
10: D′ ← Digits(s)
11: if D′ ⊏ D then ▷ D′ is prefix of D
12: Append(R, (prefix ++ s)[1..n]) ▷ limit the length to n
13: else if D ⊏ D′ then
14: + s, D − D′ , T ′ ))
Push(Q, (prefix +
15: return R
Exercise 6.5
6.6 Summary
We start from integer trie and prefix tree. By turning the integer key to binary format,
we re-used binary tree to realize the integer based map data structure. Then extend the
key from integer to generic list, and limit the list element to finite set. Particularly for
alphabetic strings, the generic trie and prefix tree can be used as tools to manipulate the
text information. We give example applications about auto-completion and predictive
text input. as another instance of radix tree, the suffix tree is closely related to trie and
prefix tree used in text, and DNA processing.
122 CHAPTER 6. RADIX TREE
The following example insert program uses bit-wise operation to test even/odd, and
shift the bit to right:
IntTrie<T> insert(IntTrie<T> t, Int key,
Optional<T> value = [Link]) {
if t == null then t = IntTrie<T>()
p = t
while key ̸= 0 {
if key & 1 == 0 {
p = if [Link] == null then IntTrie<T>() else [Link]
} else {
p = if [Link] == null then IntTrie<T>() else [Link]
}
key = key >> 1
}
[Link] = [Link](value)
return t
}
IntTree(Int k, T v) {
key = k, value = v, prefix = k
}
[Link] = value
} else {
p = branch(node, IntTree(key, value))
if parent == null then return p
[Link](node, p)
}
return t
}
Self PrefixTree(V v) {
value = [Link](v)
}
}
return t
}
prefix, k1, k2 = lcp(key, k)
if prefix ̸= [] {
match = true
if k2 == [] {
node = tr
key = k1
break
} else {
[Link][prefix] = branch(k1, PrefixTree(value),
k2, tr)
[Link](k)
return t
}
}
}
if !match {
[Link][key] = PrefixTree(value)
break
}
}
return t
}
B-Trees
7.1 Introduction
The B-Tree is an important data structure. It is widely used in modern file systems.
Some are implemented based on B+ trees, which is an extension of a B-tree. B-trees are
also widely used in database systems.
Some textbooks introduce B-trees with the the problem of accessing a large block of
data on a magnetic disk or a secondary storage device[4]. It is also helpful to understand
B-trees as a generalization of balanced binary search trees[39].
When examining Figure 7.1, it is easy to find the differences and similarities between
B-trees and binary search trees.
C G P T W
A B D E F H I J K N O Q R S U V X Y Z
Let’s remind ourselves of the definition of binary search tree. A binary search tree is
• either an empty node;
• or a node which contains 3 parts, a value, a left child and a right child. Both
children are also binary search trees.
The binary search tree satisfies the constraint that.
• all the values on the left child are not greater than the value of of this node;
• the value of this node is not greater than any values on the right child.
For a non-empty binary tree (L, k, R), where L, R and k are the left, right children, and
the key. Function Key(T ) accesses the key of tree T . The constraint can be represented
as the following.
∀x ∈ L, ∀y ∈ R ⇒ Key(x) ≤ k ≤ Key(y) (7.1)
127
128 CHAPTER 7. B-TREES
If we extend this definition to allow multiple keys and children, we get the B-tree
definition.
A B-tree
• is either empty;
• or contains n keys, and n + 1 children, each child is also a B-Tree, we denote these
keys and children as k1 , k2 , ..., kn and c1 , c2 , ..., cn , cn+1 .
Figure 7.2 illustrates a B-Tree node.
The keys and children in a node satisfy the following order constraints.
• Keys are stored in non-decreasing order. that k1 ≤ k2 ≤ ... ≤ kn ;
• for each ki , all elements stored in child ci are not greater than ki , while ki is not
greater than any values stored in child ci+1 .
The constraints can be represented as in equation (7.2) as well.
∀xi ∈ ci , i = 0, 1, ..., n, ⇒ x1 ≤ k1 ≤ x2 ≤ k2 ≤ ... ≤ xn ≤ kn ≤ xn+1 (7.2)
Finally, after adding some constraints to make the tree balanced, we get the complete
B-tree definition.
• All leaves have the same depth;
• We define the integral number t as the minimum degree of the B-tree;
– each node can have at most 2t − 1 keys;
– each node can have at least t − 1 keys, except the root;
Consider a B-tree which holds n keys. The minimum degree t ≥ 2. The height is h.
All the nodes have at least t − 1 keys except the root. The root contains at least 1 key.
There are at least 2 nodes at depth 1, at least 2t nodes at depth 2, at least 2t2 nodes at
depth 3, ..., finally, there are at least 2th−1 nodes at depth h. Times all nodes with t − 1
except for root, the total number of keys satisfies the following inequality.
n ≥ 1 + (t − 1)(2 + 2t + 2t2 + ... + 2th−1 )
∑
h−1
= 1 + 2(t − 1) tk
k=0 (7.3)
th − 1
= 1 + 2(t − 1)
t−1
= 2th − 1
Thus we have the inequality between the height and the number of keys.
n+1
h ≤ logt (7.4)
2
This is the reason why a B-tree is balanced. The simplest B-tree is a so called 2-3-4
tree, where t = 2, in that every node except the root contains 2 or 3 or 4 keys. Essentially,
a red-black tree can be mapped to 2-3-4 tree.
The following Python code shows an example B-tree definition. It explicitly accepts t
as a parameter when creating a node.
7.2. INSERTION 129
class BTree:
def __init__(self, t):
self.t = t
[Link] = []
[Link] = []
B-tree nodes commonly have satellite data as well. We ignore satellite data for illus-
tration purpose.
In this chapter, we will first introduce how to generate a B-tree by insertion. Two
different methods will be explained. One is the classic method as in [4], that we split
the node before insertion if it’s full; the other is the modify-fix approach which is quite
similar to the red-black tree solution [3] [39]. We will next explain how to delete keys
from B-trees and how to look up a key.
7.2 Insertion
B-tree can be created by inserting keys repeatedly. The basic idea is similar to the binary
search tree. When insert key x, from the tree root, we examine all the keys in the node
to find a position where all the keys on the left are less than x, while all the keys on
the right are greater than x.1 If the current node is a leaf node, and it is not full (there
are less then 2t − 1 keys in this node), x will be insert at this position. Otherwise, the
position points to a child node. We need recursively insert x to it.
20
4 11 26 38 45
1 2 5 8 9 12 15 16 17 21 25 30 31 37 40 42 46 47 50
(a) Insert key 22 to the 2-3-4 tree. 22 > 20, go to the right child; 22 < 26
go to the first child.
20
4 11 26 38 45
1 2 5 8 9 12 15 16 17 21 22 25 30 31 37 40 42 46 47 50
Figure 7.3 shows one example. The B-tree illustrated is 2-3-4 tree. When insert key
x = 22, because it’s greater than the root, the right child contains key 26, 38, 45 is
examined next; Since 22 < 26, the first child contains key 21 and 25 are examined. This
is a leaf node, and it is not full, key 22 is inserted to this node.
However, if there are 2t − 1 keys in the leaf, the new key x can’t be inserted, because
this node is ’full’. When try to insert key 18 to the above example B-tree will meet this
problem. There are 2 methods to solve it.
1 This is a strong constraint. In fact, only less-than and equality testing is necessary. The later exercise
7.2.1 Splitting
Split before insertion
If the node is full, one method to solve the problem is to split to node before insertion.
For a node with t − 1 keys, it can be divided into 3 parts as shown in Figure 7.4. the
left part contains the first t − 1 keys and t children. The right part contains the rest t − 1
keys and t children. Both left part and right part are valid B-tree nodes. the middle part
is the t-th key. We can push it up to the parent node (if the current node is root, then
the this key, with the two children will be the new root).
For node x, denote K(x) as keys, C(x) as children. The i-th key as ki (x), the j-th
child as cj (x). Below algorithm describes how to split the i-th child for a given node.
1: procedure Split-Child(node, i)
2: x ← ci (node)
3: y ← CREATE-NODE
4: Insert(K(node), i, kt (x))
5: Insert(C(node), i + 1, y)
6: K(y) ← {kt+1 (x), kt+2 (x), ..., k2t−1 (x)}
7: K(x) ← {k1 (x), k2 (x), ..., kt−1 (x)}
8: if y is not leaf then
9: C(y) ← {ct+1 (x), ct+2 (x), ..., c2t (x)}
10: C(x) ← {c1 (x), c2 (x), ..., ct (x)}
The following example Python program implements this child splitting algorithm.
def split_child(node, i):
t = node.t
x = [Link][i]
y = BTree(t)
[Link](i, [Link][t-1])
[Link](i+1, y)
[Link] = [Link][t:]
[Link] = [Link][:t-1]
if not is_leaf(x):
[Link] = [Link][t:]
[Link] = [Link][:t]
7.2. INSERTION 131
After splitting, a key is pushed up to its parent node. It is quite possible that the
parent node has already been full. And this pushing violates the B-tree property.
In order to solve this problem, we can check from the root along the path of insertion
traversing till the leaf. If there is any node in this path is full, the splitting is applied.
Since the parent of this node has been examined, it is ensured that there are less than
2t − 1 keys in the parent. It won’t make the parent full if pushing up one key. This
approach only need one single pass down the tree without any back-tracking.
If the root need splitting, a new node is created as the new root. There is no keys in
this new created root, and the previous root is set as the only child. After that, splitting
is performed top-down. And we can insert the new key finally.
1: function Insert(T, k)
2: r←T
3: if r is full then ▷ root is full
4: s ← CREATE-NODE
5: C(s) ← {r}
6: Split-Child(s, 1)
7: r←s
8: return Insert-Nonfull(r, k)
Where algorithm Insert-Nonfull assumes the node passed in is not full. If it is a
leaf node, the new key is inserted to the proper position based on the order; Otherwise,
the algorithm finds a proper child node to which the new key will be inserted. If this
child is full, splitting will be performed.
1: function Insert-Nonfull(T, k)
2: if T is leaf then
3: i←1
4: while i ≤ |K(T )| ∧ k > ki (T ) do
5: i←i+1
6: Insert(K(T ), i, k)
7: else
8: i ← |K(T )|
9: while i > 1 ∧ k < ki (T ) do
10: i←i−1
11: if ci (T ) is full then
12: Split-Child(T, i)
13: if k > ki (T ) then
14: i←i+1
15: Insert-Nonfull(ci (T ), k)
16: return T
This algorithm is recursive. In B-tree, the minimum degree t is typically relative to
magnetic disk structure. Even small depth can support huge amount of data (with t = 10,
maximum to 10 billion data can be stored in a B-tree with height of 10). The recursion
can also be eliminated. This is left as exercise to the reader.
Figure 7.5 shows the result of continuously inserting keys G, M, P, X, A, C, D, E, J,
K, N, O, R, S, T, U, V, Y, Z to the empty tree. The first result is the 2-3-4 tree (t = 2).
The second result shows how it varies when t = 3.
Below example Python program implements this algorithm.
132 CHAPTER 7. B-TREES
E P
C M S U X
A D G J K N O R T V Y Z
A C E G J K N O R S U V X Y Z
(b) t = 3
def is_full(node):
return len([Link]) ≥ 2 ∗ node.t - 1
For the array based collection, append on the tail is much more effective than insert in
other position, because the later takes O(n) time, if the length of the collection is n. The
ordered_insert program firstly appends the new element at the end of the existing
7.2. INSERTION 133
collection, then iterates from the last element to the first one, and checks if the current
two elements next to each other are ordered. If not, these two elements will be swapped.
Function ins(T, k) traverse the B-tree T from root to find a proper position where key
k can be inserted. After that, function f ix is applied to resume the B-tree properties.
Denote B-tree in a form of T = (K, C, t), where K represents keys, C represents children,
and t is the minimum degree.
Below is the Haskell definition of B-tree.
data BTree a = Node{ keys :: [a]
, children :: [BTree a]
, degree :: Int} deriving (Eq)
There are two cases when realize ins(T, k) function. If the tree T is leaf, k is inserted
to the keys; Otherwise if T is the branch node, we need recursively insert k to the proper
child.
Figure 7.6 shows the branch case. The algorithm first locates the position. for certain
key ki , if the new key k to be inserted satisfy ki−1 < k < ki , Then we need recursively
insert k to child ci .
This position divides the node into 3 parts, the left part, the child ci and the right
part.
k, K[i-1]<k<K[i]
insert to
recursive insert
K = K ′ ∪ K ′′ ∧ ∀k ′ ∈ K ′ , k ′′ ∈ K ′′ ⇒ k ′ ≤ k ≤ k ′′
The second clause handle the branch case. Function split(n, C) splits children in two
parts, C1 and C2 . C1 contains the first n children; and C2 contains the rest. Among C2 ,
the first child is denoted as c, and others are represented as C2′ .
Here the key k need be recursively inserted into child c. Function make takes 3
parameter. The first and the third are pairs of key and children; the second parameter
is a child node. It examines if a B-tree node made from these keys and children violates
the minimum degree constraint and performs fixing if necessary.
{
f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) : f ull(c)
make((K ′ , C ′ ), c, (K ′′ , C ′′ )) = (7.7)
(K ′ ∪ K ′′ , C ′ ∪ {c} ∪ C ′′ , t) : otherwise
Where function f ull(c) tests if the child c is full. Function f ixF ull splits the the child
c, and forms a new B-tree node with the pushed up key.
Where (c1 , k ′ , c2 ) = split(c). During splitting, the first t − 1 keys and t children are
extract to one new child, the last t − 1 keys and t children form another child. The t-th
key k ′ is pushed up.
With all the above functions defined, we can realize f ix(T ) to complete the functional
B-tree insertion algorithm. It firstly checks if the root contains too many keys. If it
exceeds the limit, splitting will be applied. The split result will be used to make a new
node, so the total height of the tree increases by one.
c : T = (ϕ, {c}, t)
f ix(T ) = ({k ′ }, {c1 , c2 }, t) : f ull(T ), (c1 , k ′ , c2 ) = split(T ) (7.9)
T : otherwise
where
(c1, k, c2) = split c
Figure 7.7 shows the varies of results of building B-trees by continuously inserting
keys ”GMPXACDEJKNORSTUVYZ”.
E O
C M R T V
A D G J K N P S U X Y Z
A C D E J K N O R S U V X Y Z
Compare to the imperative insertion result as shown in figure 7.7 we can found that
there are different. However, they are all valid because all B-tree properties are satisfied.
7.3 Deletion
Deleting a key from B-tree may violate balance properties. Except the root, a node
shouldn’t contain too few keys less than t − 1, where t is the minimum degree.
Similar to the approaches for insertion, we can either do some preparation so that the
node from where the key being deleted contains enough keys; or do some fixing after the
deletion if the node has too few keys.
• Case 2a, If the child y precedes k contains enough keys (more than t), we replace k
in node x with k ′ , which is the predecessor of k in child y. And recursively remove
k ′ from y.
The predecessor of k can be easily located as the last key of child y.
This is shown in figure 7.8.
136 CHAPTER 7. B-TREES
• Case 2b, If y doesn’t contain enough keys, while the child z follows k contains more
than t keys. We replace k in node x with k ′′ , which is the successor of k in child z.
And recursively remove k ′′ from z.
The successor of k can be easily located as the first key of child z.
This sub-case is illustrated in figure 7.9.
• Case 2c, Otherwise, if neither y, nor z contains enough keys, we can merge y, k and
z into one new node, so that this new node contains 2t − 1 keys. After that, we can
then recursively do the removing.
Note that after merge, if the current node doesn’t contain any keys, which means
k is the only key in x. y and z are the only two children of x. we need shrink the
tree height by one.
• Case 3a, We check the two sibling of ci , which are ci−1 and ci+1 . If either one
contains enough keys (at least t keys), we move one key from x down to ci , and
move one key from the sibling up to x. Also we need move the relative child from
the sibling to ci .
This operation makes ci contains enough keys for deletion. we can next try to delete
k from ci recursively.
Figure 7.11 illustrates this case.
• Case 3b, In case neither one of the two siblings contains enough keys, we then merge
ci , a key from x, and either one of the sibling into a new node. Then do the deletion
on this new node.
7.3. DELETION 137
9: ki (T ) ← Last-Key(ci (T ))
10: Delete(ci (T ), ki (T ))
11: else if Can-Del(ci+1 (T )) then ▷ case 2b
12: ki (T ) ← First-Key(ci+1 (T ))
13: Delete(ci+1 (T ), ki (T ))
14: else ▷ case 2c
15: Merge-Children(T, i)
16: Delete(ci (T ), k)
17: if K(T ) = N IL then
18: T ← ci (T ) ▷ Shrinks height
19: return T
20: else if k < ki (T ) then
21: Break
22: else
23: i←i+1
C G M T X
A B D E F J K L N O Q R S U V Y Z
C G M T X
A B D E J K L N O Q R S U V Y Z
The following example Python program implements the B-tree deletion algorithm.
def can_remove(tr):
return len([Link]) ≥ tr.t
C G L T X
A B D E J K N O Q R S U V Y Z
C L T X
A B D E J K N O Q R S U V Y Z
C L P T X
A B E J K N O Q R S U V Y Z
(a) After delete key ’D’, case 3b, and height is shrunk.
E L P T X
A C J K N O Q R S U V Y Z
(b) After delete key ’B’, case 3a, borrow from right sib-
ling.
E L P S X
A C J K N O Q R T V Y Z
(c) After delete key ’U’, case 3a, borrow from left
sibling.
while i>0:
if key == [Link][i-1]:
if [Link]: # case 1 in CLRS
[Link](key)
else: # case 2 in CLRS
if [Link][i-1].can_remove(): # case 2a
key = tr.replace_key(i-1, [Link][i-1].keys[-1])
B_tree_delete([Link][i-1], key)
elif [Link][i].can_remove(): # case 2b
key = tr.replace_key(i-1, [Link][i].keys[0])
B_tree_delete([Link][i], key)
else: # case 2c
tr.merge_children(i-1)
B_tree_delete([Link][i-1], key)
if [Link]==[]: # tree shrinks in height
tr = [Link][i-1]
return tr
elif key > [Link][i-1]:
break
else:
i = i-1
# case 3
if [Link]:
return tr #key doesn’t exist at all
if not [Link][i].can_remove():
if i>0 and [Link][i-1].can_remove(): #left sibling
[Link][i].[Link](0, [Link][i-1])
[Link][i-1] = [Link][i-1].[Link]()
if not [Link][i].leaf:
[Link][i].[Link](0, [Link][i-1].[Link]())
elif i<len([Link]) and [Link][i+1].can_remove(): #right sibling
[Link][i].[Link]([Link][i])
[Link][i]=[Link][i+1].[Link](0)
if not [Link][i].leaf:
[Link][i].[Link]([Link][i+1].[Link](0))
else: # case 3b
if i>0:
tr.merge_children(i-1)
else:
tr.merge_children(i)
B_tree_delete([Link][i], key)
if [Link]==[]: # tree shrinks in height
tr = [Link][0]
return tr
When delete a key from B-tree, we firstly locate which node this key is contained. We
traverse from the root to the leaves till find this key in some node.
If this node is a leaf, we can remove the key, and then examine if the deletion makes
the node contains too few keys to satisfy the B-tree balance properties.
If it is a branch node, removing the key breaks the node into two parts. We need
merge them together. The merging is a recursive process which is shown in figure 7.16.
7.3. DELETION 143
Figure 7.16: Delete a key from a branch node. Removing ki breaks the node into 2 parts.
Merging these 2 parts is a recursive process. When the two parts are leaves, the merging
terminates.
When do merging, if the two nodes are not leaves, we merge the keys together, and
recursively merge the last child of the left part and the first child of the right part to one
new node. Otherwise, if they are leaves, we merely put all keys together.
Till now, the deleting is performed in straightforward way. However, deleting decreases
the number of keys of a node, and it may result in violating the B-tree balance properties.
The solution is to perform fixing along the path traversed from root.
During the recursive deletion, the branch node is broken into 3 parts. The left part
contains all keys less than k, includes k1 , k2 , ..., ki−1 , and children c1 , c2 , ..., ci−1 , the right
part contains all keys greater than k, say ki , ki+1 , ..., kn+1 , and children ci+1 , ci+2 , ..., cn+1 .
Then key k is recursively deleted from child ci . Denote the result becomes c′i after that.
We need make a new node from these 3 parts, as shown in figure 7.17.
At this time point, we need examine if c′i contains enough keys. If there are too less
keys (less than t − 1, but not t in contrast to the merge-and-delete approach), we can
either borrow a key-child pair from the left or the right part, and do inverse operation of
splitting. Figure 7.18 shows example of borrowing from the left part.
If both left part and right part are empty, we can simply push c′i up.
Denote the B-tree as T = (K, C, t), where K and C are keys and children. The
del(T, k) function deletes key k from the tree.
(delete(K, k), ϕ, t) : C=ϕ
del(T, k) = merge((K1 , C1 , t), (K2 , C2 , t)) : ki = k (7.11)
make((K1′ , C1′ ), del(c, k), (K2′ , C2′ )) : k∈/K
Figure 7.17: After delete key k from node ci , denote the result as c′i . The fixing makes a
new node from the left part, c′i and the right part.
Figure 7.18: Borrow a key-child pair from left part and un-split to a new child.
7.3. DELETION 145
If k ∈
/ K, we need locate a child c, and further delete k from it.
The recursive merge function is defined as the following. When merge two trees
T1 = (K1 , C1 , t) and T2 = (K2 , C2 , t), if both are leaves, we create a new leave by
concatenating the keys. Otherwise, the last child in C1 , and the first child in C2 are
recursively merged. And we call make function to form the new tree. When C1 and C2
are not empty, denote the last child of C1 as c1,m , the rest as C1′ ; the first child of C2 as
C2,1 , the rest as C2′ . Below equation defines the merge function.
{
(K1 ∪ K2 , ϕ, t) : C1 = C2 = ϕ
merge(T1 , T2 ) =
make((K1 , C1′ ), merge(c1,m , c2,1 ), (K2 , C2′ )) : otherwise
(7.12)
The make function defined above only handles the case that a node contains too many
keys due to insertion. When delete key, it may cause a node contains too few keys. We
need test and fix this situation as well.
f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) : f ull(c)
′ ′ ′′ ′′
make((K , C ), c, (K , C )) = f ixLow((K ′ , C ′ ), c, (K ′′ , C ′′ )) : low(c) (7.13)
(K ′ ∪ K ′′ , C ′ ∪ {c} ∪ C ′′ , t) : otherwise
Where low(T ) checks if there are too few keys less than t−1. Function f ixLow(Pl , c, Pr )
takes three arguments, the left pair of keys and children, a child node, and the right pair
of keys and children. If the left part isn’t empty, we borrow a pair of key-child, and do
un-splitting to make the child contain enough keys, then recursively call make; If the
right part isn’t empty, we borrow a pair from the right; and if both sides are empty, we
return the child node as result. In this case, the height of the tree shrinks.
Denote the left part Pl = (Kl , Cl ). If Kl isn’t empty, the last key and child are
represented as kl,m and cl,m respectively. The rest keys and children become Kl′ and Cl′ ;
Similarly, the right part is denoted as Pr = (Kr , Cr ). If Kr isn’t empty, the first key and
child are represented as kr,1 , and cr,1 . The rest keys and children are Kr′ and Cr′ . Below
equation gives the definition of f ixLow.
make((Kl′ , Cl′ ), unsplit(cl,m , kl,m , c), (Kr , Cr )) : Kl 6= ϕ
f ixLow(Pl , c, Pr ) = make((Kr , Cr ), unsplit(c, kr,1 , cr,1 ), (Kr′ , Cr′ )) : Kr 6= ϕ
c : otherwise
(7.14)
Function unsplit(T1 , k, T2 ) is the inverse operation to splitting. It forms a new B-tree
nodes from two small nodes and a key.
The following example Haskell program implements the B-tree deletion algorithm.
import qualified [Link] as L
fixLow (ks'@(_:_), cs') c (ks'', cs'') = make (init ks', init cs')
(unsplit (last cs') (last ks') c)
(ks'', cs'')
fixLow (ks', cs') c (ks''@(_:_), cs'') = make (ks', cs')
(unsplit c (head ks'') (head cs''))
(tail ks'', tail cs'')
fixLow _ c _ = c
When delete the same keys from the B-tree as in delete and fixing approach, the results
are different. However, both satisfy the B-tree properties, so they are all valid.
C G P T W
A B D E F H I J K N O Q R S U V X Y Z
C G P T W
A B D F H I J K N O Q R S U V X Y Z
C H P T W
A B D F I J K N O Q R S U V X Y Z
B C D F I J K N O Q R S U V X Y Z
H P T W
B C D F I J K N O Q R S U V X Y Z
B C D F I J K N O Q R S T V X Y Z
7.4 Searching
Searching in B-tree can be considered as the generalized tree search extended from binary
search tree.
When searching in the binary tree, there are only 2 different directions, the left and
the right. However, there are multiple directions in B-tree.
1: function Search(T, k)
2: loop
3: i←1
4: while i ≤ |K(T )| ∧ k > ki (T ) do
5: i←i+1
6: if i ≤ |K(T )| ∧ k = ki (T ) then
7: return (T, i)
8: if T is leaf then
9: return N IL ▷ k doesn’t exist
10: else
11: T ← ci (T )
Starts from the root, this program examines each key one by one from the smallest to
the biggest. In case it finds the matched key, it returns the current node and the index
of this key. Otherwise, if it finds the position i that ki < k < ki+1 , the program will next
search the child node ci+1 for the key. If it traverses to some leaf node, and fails to find
the key, the empty value is returned to indicate that this key doesn’t exist in the tree.
The following example Python program implements the search algorithm.
def B_tree_search(tr, key):
while True:
for i in range(len([Link])):
if key ≤ [Link][i]:
break
if key == [Link][i]:
return (tr, i)
if [Link]:
return None
else:
if key > [Link][-1]:
i=i+1
tr = [Link][i]
The search algorithm can also be realized by recursion. When search key k in B-tree
T = (K, C, t), we partition the keys with k.
K1 = {k ′ |k ′ < k}
K2 = {k ′ |k ≤ k ′ }
Thus K1 contains all the keys less than k, and K2 holds the rest. If the first element
in K2 is equal to k, we find the key. Otherwise, we recursively search the key in child
c|K1 |+1 .
(T, |K1 | + 1) : k ∈ K2
search(T, k) = ϕ : C=ϕ (7.16)
search(c|K1 |+1 , k) : otherwise
Exercise 7.1
• When insert a key, we need find a position, where all keys on the left are less than
it, while all the others on the right are greater than it. Modify the algorithm so
that the elements stored in B-tree only need support less-than and equality test.
• We assume the element being inserted doesn’t exist in the tree. Modify the algo-
rithm so that duplicated elements can be stored in a linked-list.
• Eliminate the recursion in imperative B-tree insertion algorithm.
150 CHAPTER 7. B-TREES
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[2] B-tree, Wikipedia. [Link]
[3] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a Functional Setting”.
J. Functional Programming. 1998
151
152 Binary Heaps
Chapter 8
Binary Heaps
8.1 Introduction
Heaps are one of the most widely used data structures–used to solve practical problems
such as sorting, prioritized scheduling and in implementing graph algorithms, to name a
few[40].
Most popular implementations of heaps use a kind of implicit binary heap using arrays,
which is described in [4]. Examples include C++/STL heap and Python heapq. The most
efficient heap sort algorithm is also realized with binary heap as proposed by R. W. Floyd
[41] [42].
However, heaps can be general and realized with varies of other data structures besides
array. In this chapter, explicit binary tree is used. It leads to Leftist heaps, Skew heaps,
and Splay heaps, which are suitable for purely functional implementation as shown by
Okasaki[3].
A heap is a data structure that satisfies the following heap property.
• Pop operation removes the top element from the heap while the heap property
should be kept, so that the new top element is still the minimum (maximum) one;
• Insert a new element to heap should keep the heap property. That the new top is
still the minimum (maximum) element;
• Other operations including merge etc should all keep the heap property.
This is a kind of recursive definition, while it doesn’t limit the under ground data
structure.
We call the heap with the minimum element on top as min-heap, while if the top keeps
the maximum element, we call it max-heap.
153
154 CHAPTER 8. BINARY HEAPS
8.2.1 Definition
The first one is implicit binary tree. Consider the problem how to represent a complete
binary tree with array. (For example, try to represent a complete binary tree in the
programming language doesn’t support structure or record data type. Only array can be
used). One solution is to pack all elements from top level (root) down to bottom level
(leaves).
Figure 8.1 shows a complete binary tree and its corresponding array representation.
16
14 10
8 7 9 3
2 4 1
16 14 10 8 7 9 3 2 4 1
This mapping between tree and array can be defined as the following equations (The
array index starts from 1).
1: function Parent(i)
2: return b 2i c
3: function Left(i)
4: return 2i
5: function Right(i)
6: return 2i + 1
For a given tree node which is represented as the i-th element of the array, since the
tree is complete, we can easily find its parent node as the bi/2c-th element; Its left child
with index of 2i and right child of 2i + 1. If the index of the child exceeds the length of
the array, it means this node does not have such a child (leaf for example).
In real implementation, this mapping can be calculated fast with bit-wise operation
like the following example ANSI C code. Note that, the array index starts from zero in
C like languages.
#define PARENT(i) ((((i) + 1) >> 1) - 1)
8.2.2 Heapify
The most important thing for heap algorithm is to maintain the heap property, that the
top element should be the minimum (maximum) one.
8.2. IMPLICIT BINARY HEAP BY ARRAY 155
For the implicit binary heap by array, it means for a given node, which is represented
as the i-th index, we can develop a method to check if both its two children are not less
than the parent. In case there is violation, we need swap the parent and child recursively
[4]. Note that here we assume both the two sub-trees are the valid heaps.
Below algorithm shows the iterative solution to enforce the min-heap property from a
given index of the array.
1: function Heapify(A, i)
2: n ← |A|
3: loop
4: l ← Left(i)
5: r ← Right(i)
6: smallest ← i
7: if l < n ∧ A[l] < A[i] then
8: smallest ← l
9: if r < n ∧ A[r] < A[smallest] then
10: smallest ← r
11: if smallest 6= i then
12: Exchange A[i] ↔ A[smallest]
13: i ← smallest
14: else
15: return
For array A and the given index i, None its children should be less than A[i], in case
there is violation, we pick the smallest element as A[i], and swap the previous A[i] to
child. The algorithm traverses the tree top-down to fix the heap property until either
reach a leaf or there is no heap property violation.
The Heapify algorithm takes O(lg n) time, where n is the number of elements. This
is because the loop time is proportion to the height of the complete binary tree.
When implement this algorithm, the comparison method can be passed as a parameter,
so that both min-heap and max-heap can be supported. The following ANSI C example
code uses this approach.
Figure 8.2 illustrates the steps when Heapify processing the array {16, 4, 10, 14, 7, 9, 3, 2, 8, 1}
from the second index. The array changes to {16, 14, 10, 8, 7, 9, 3, 2, 4, 1} as a max-heap.
156 CHAPTER 8. BINARY HEAPS
16
4 10
14 7 9 3
2 8 1
(a) Step 1, 14 is the biggest element among 4, 14, and 7. Swap 4 with
the left child;
16
14 10
4 7 9 3
2 8 1
(b) Step 2, 8 is the biggest element among 2, 4, and 8. Swap 4 with the
right child;
16
14 10
8 7 9 3
2 4 1
1 1 1
S = n( + + + ...) = n
2 4 8
Below ANSI C example program implements this heap building function.
void build_heap(Key∗ a, int n, Less lt) {
int i;
for (i = (n-1) >> 1; i ≥ 0; --i)
heapify(a, i, n, lt);
}
Figure 8.3, 8.4 and 8.5 show the steps when building a max-heap from array {4, 1, 3, 2, 16, 9, 10, 14, 8, 7
The node in black color is the one where Heapify being applied, the nodes in gray color
are swapped in order to keep the heap property.
4 1 3 2 16 9 10 14 8 7
1 3
2 16 9 10
14 8 7
(b) Step 1, The array is mapped to binary tree. The first branch node,
which is 16 is examined;
4
1 3
2 16 9 10
14 8 7
(c) Step 2, 16 is the largest element in current sub tree, next is to check
node with value 2;
Figure 8.3: Build a heap from the arbitrary array. Gray nodes are changed in each step,
black node will be processed next step.
8.2. IMPLICIT BINARY HEAP BY ARRAY 159
1 3
14 16 9 10
2 8 7
(a) Step 3, 14 is the largest value in the sub-tree, swap 14 and 2; next
is to check node with value 3;
4
1 10
14 16 9 3
2 8 7
(b) Step 4, 10 is the largest value in the sub-tree, swap 10 and 3; next
is to check node with value 1;
Figure 8.4: Build a heap from the arbitrary array. Gray nodes are changed in each step,
black node will be processed next step.
160 CHAPTER 8. BINARY HEAPS
16 10
14 7 9 3
2 8 1
14 10
8 7 9 3
2 4 1
(b) Step 6, Swap 4 and 16, then swap 4 and 14, and then swap 4 and 8;
And the whole build process finish.
Figure 8.5: Build a heap from the arbitrary array. Gray nodes are changed in each step,
black node will be processed next step.
8.2. IMPLICIT BINARY HEAP BY ARRAY 161
The most important operations include accessing the top element (find the minimum
or maximum one), popping the top element from the heap, finding the top k elements,
decreasing a key ( for min-heap. It is increasing a key for max-heap), and insertion.
For the binary tree, most of operations are bound to O(lg n) in worst-case, some of
them, such as top is O(1) constant time.
Heap Pop
Pop operation is more complex than accessing the top, because the heap property has to
be maintained after the top element is removed.
The solution is to apply Heapify algorithm to the next element after the root is
removed.
One simple but slow method based on this idea looks like the following.
1: function Pop-Slow(A)
2: x ← Top(A)
3: Remove(A, 1)
4: if A is not empty then
5: Heapify(A, 1)
6: return x
This algorithm firstly records the top element in x, then it removes the first element
from the array, the size of this array is reduced by one. After that if the array isn’t
empty, Heapify will applied to the new array from the first element (It was previously
the second one).
Removing the first element from array takes O(n) time, where n is the length of the
array. This is because we need shift all the rest elements one by one. This bottle neck
slows the whole algorithm to linear time.
In order to solve this problem, one alternative is to swap the first element with the
last one in the array, then shrink the array size by one.
1: function Pop(A)
2: x ← Top(A)
3: n ← Heap-Size(A)
4: Exchange A[1] ↔ A[n]
5: Remove(A, n)
6: if A is not empty then
7: Heapify(A, 1)
8: return x
Removing the last element from the array takes only constant O(1) time, and Heapify
is bound to O(lg n). Thus the whole algorithm performs in O(lg n) time. The following
example ANSI C program implements this algorithm1 .
1 This program does not actually remove the last element, it reuse the last cell to store the popped
result
162 CHAPTER 8. BINARY HEAPS
Decrease key
Heap can be used to implement priority queue. It is important to support key modification
in heap. One typical operation is to increase the priority of a tasks so that it can be
performed earlier.
Here we present the decrease key operation for a min-heap. The corresponding opera-
tion is increase key for max-heap. Figure 8.6 and 8.7 illustrate such a case for a max-heap.
The key of the 9-th node is increased from 4 to 15.
Once a key is decreased in a min-heap, it may make the node conflict with the heap
property, that the key may be less than some ancestor. In order to maintain the invariant,
the following auxiliary algorithm is defined to resume the heap property.
1: function Heap-Fix(A, i)
2: while i > 1 ∧ A[i] < A[ Parent(i) ] do
3: Exchange A[i] ↔ A[ Parent(i) ]
4: i ← Parent(i)
This algorithm repeatedly compares the keys of parent node and current node. It
swap the nodes if the parent contains the smaller key. This process is performed from the
current node towards the root node till it finds that the parent node holds the smaller
key.
With this auxiliary algorithm, decrease key can be realized as below.
1: function Decrease-Key(A, i, k)
2: if k < A[i] then
3: A[i] ← k
4: Heap-Fix(A, i)
This algorithm is only triggered when the new key is less than the original key. The
performance is bound to O(lg n). Below example ANSI C program implements the algo-
rithm.
8.2. IMPLICIT BINARY HEAP BY ARRAY 163
16
14 10
8 7 9 3
2 4 1
14 10
8 7 9 3
2 15 1
(b) The key is modified to 15, which is greater than its parent;
16
14 10
15 7 9 3
2 8 1
16
15 10
14 7 9 3
2 8 1
(a) Since 15 is greater than its parent 14, they are swapped. After that,
because 15 is less than 16, the process terminates.
Insertion
Insertion can be implemented by using Decrease-Key [4]. A new node with ∞ as key
is created. According to the min-heap property, it should be the last element in the
under ground array. After that, the key is decreased to the value to be inserted, and
Decrease-Key is called to fix any violation to the heap property.
Alternatively, we can reuse Heap-Fix to implement insertion. The new key is directly
appended at the end of the array, and the Heap-Fix is applied to this new node.
1: function Heap-Push(A, k)
2: Append(A, k)
3: Heap-Fix(A, |A|)
The following example Python program implements the heap insertion algorithm.
def heap_insert(x, key, less_p = MIN_HEAP):
i = len(x)
[Link](key)
heap_fix(x, i, less_p)
When sort n elements, the Build-Heap is bound to O(n). Since pop is O(lg n), and
it is called n times, so the overall sorting takes O(n lg n) time to run. Because we use
another list to hold the result, the space requirement is O(n).
Robert. W. Floyd found a fast implementation of heap sort. The idea is to build a
max-heap instead of min-heap, so the first element is the biggest one. Then the biggest
element is swapped with the last element in the array, so that it is in the right position
after sorting. As the last element becomes the new top, it may violate the heap property.
We can shrink the heap size by one and perform Heapify to resume the heap property.
This process is repeated till there is only one element left in the heap.
1: function Heap-Sort(A)
2: Build-Max-Heap(A)
3: while |A| > 1 do
4: Exchange A[1] ↔ A[n]
5: |A| ← |A| − 1
6: Heapify(A, 1)
This is in-place algorithm, it needn’t any extra spaces to hold the result. The following
ANSI C example code implements this algorithm.
void heap_sort(Key∗ a, int n) {
build_heap(a, n, notless);
while(n > 1) {
swap(a, 0, --n);
heapify(a, 0, n, notless);
}
}
Exercise 8.1
• Somebody considers one alternative to realize in-place heap sort. Take sorting the
array in ascending order as example, the first step is to build the array as a minimum
heap A, but not the maximum heap like the Floyd’s method. After that the first
element a1 is in the correct place. Next, treat the rest {a2 , a3 , ..., an } as a new heap,
and perform Heapify to them from a2 for these n − 1 elements. Repeating this
advance and Heapify step from left to right would sort the array. The following
example ANSI C code illustrates this idea. Is this solution correct? If yes, prove it;
if not, why?
void heap_sort(Key∗ a, int n) {
build_heap(a, n, less);
while(--n)
heapify(++a, 0, n, less);
}
• Because of the same reason, can we perform Heapify from left to right k times to
realize in-place top-k algorithm like below ANSI C code?
int tops(int k, Key∗ a, int n, Less lt) {
build_heap(a, n, lt);
for (k = MIN(k, n) - 1; k; --k)
heapify(++a, 0, --n, lt);
return k;
}
166 CHAPTER 8. BINARY HEAPS
L R
Figure 8.8: A binary tree, all elements in children are not less than k.
If k is the top element, all elements in left and right children are not less than k in
a min-heap. After k is popped, only left and right children are left. They have to be
merged to a new tree. Since heap property should be maintained after merge, the new
root is still the smallest element.
Because both left and right children are binary trees conforming heap property, the
two trivial cases can be defined immediately.
H2 : H1 = ϕ
merge(H1 , H2 ) = H1 : H2 = ϕ
? : otherwise
• (merge(A, R), x, B)
Both are correct. One simplified solution is to only merge the right sub tree. Leftist
tree provides a systematically approach based on this idea.
8.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS 167
8.3.1 Definition
The heap implemented by Leftist tree is called Leftist heap. Leftist tree is first introduced
by C. A. Crane in 1972[43].
Rank (S-value)
In Leftist tree, a rank value (or S value) is defined for each node. Rank is the distance
to the nearest external node. Where external node is the NIL concept extended from the
leaf node.
For example, in figure 8.9, the rank of NIL is defined 0, consider the root node 4, The
nearest external node is the child of node 8. So the rank of root node 4 is 2. Because
node 6 and node 8 both only contain NIL, so their rank values are 1. Although node 5
has non-NIL left child, However, since the right child is NIL, so the rank value, which is
the minimum distance to NIL is still 1.
5 8
NIL NIL
Leftist property
With rank defined, we can create a strategy when merging.
• Every time when merging, we always merge to right child; Denote the rank of the
new right sub tree as rr ;
• Compare the ranks of the left and right children, if the rank of left sub tree is rl
and rl < rr , we swap the left and the right children.
We call this ‘Leftist property’. In general, a Leftist tree always has the shortest path
to some external node on the right.
Leftist tree tends to be very unbalanced, However, it ensures important property as
specified in the following theorem.
Theorem 8.3.1. If a Leftist tree T contains n internal nodes, the path from root to the
rightmost external node contains at most blog(n + 1)c nodes.
We skip the proof here, readers can refer to [44] and [51] for more information. With
this theorem, algorithms operate along this path are all bound to O(lg n).
We can reuse the binary tree definition, and augment with a rank field to define the
Leftist tree, for example in form of (r, k, L, R) for non-empty case. Below Haskell code
defines the Leftist tree.
168 CHAPTER 8. BINARY HEAPS
For empty tree, the rank is defined as zero. Otherwise, it’s the value of the augmented
field. A rank(H) function can be given to cover both cases.
{
0 : H=ϕ
rank(H) = (8.3)
r : otherwise, H = (r, k, L, R)
8.3.2 Merge
In order to realize ‘merge’, we need develop the auxiliary algorithm to compare the ranks
and swap the children if necessary.
{
(rA + 1, k, B, A) : rA < rB
mk(k, A, B) = (8.4)
(rB + 1, k, A, B) : otherwise
This function takes three arguments, a key and two sub trees A, and B. if the rank
of A is smaller, it builds a bigger tree with B as the left child, and A as the right child.
It increment the rank of A by 1 as the rank of the new tree; Otherwise if B holds the
smaller rank, then A is set as the left child, and B becomes the right. The resulting rank
is rb + 1.
The reason why rank need be increased by one is because there is a new key added
on top of the tree. It causes the rank increasing.
Denote the key, the left and right children for H1 and H2 as k1 , L1 , R1 , and k2 , L2 , R2
respectively. The merge(H1 , H2 ) function can be completed by using this auxiliary tool
as below
H2 : H1 = ϕ
H1 : H2 = ϕ
merge(H1 , H2 ) = (8.5)
mk(k1 , L1 , merge(R1 , H2 )) : k1 < k2
mk(k2 , L2 , merge(H1 , R2 )) : otherwise
The merge function is always recursively called on the right side, and the Leftist
property is maintained. These facts ensure the performance being bound to O(lg n).
The following Haskell example code implements the merge program.
merge E h = h
merge h E = h
merge h1@(Node _ x l r) h2@(Node _ y l' r') =
if x < y then makeNode x l (merge r h2)
else makeNode y l' (merge h1 r')
top(H) = k (8.6)
For pop operation, firstly, the top element is removed, then left and right children are
merged to a new heap.
Because it calls merge directly, the pop operation on Leftist heap is bound to O(lg n).
Insertion
To insert a new element, one solution is to create a single leaf node with the element, and
then merge this leaf node to the existing Leftist tree.
Figure 8.10 shows one example Leftist tree built in this way.
The following example Haskell code gives reference implementation for the Leftist tree
operations.
insert h x = merge (Node 1 x E E) h
findMin (Node _ x _ _) = x
4 3
7 9 14 8
16 10
Figure 8.10: A Leftist tree built from list {9, 4, 16, 7, 10, 2, 14, 3, 8, 1}.
Because pop is logarithm operation, and it is recursively called n times, this algorithm
takes O(n lg n) time in total. The following Haskell example program implements heap
sort with Leftist tree.
heapSort = hsort ◦ fromList where
hsort E = []
hsort h = (findMin h):(hsort $ deleteMin h)
3 4
8 9
10
14
16
Figure 8.11: A very unbalanced Leftist tree build from list {16, 14, 10, 8, 7, 9, 3, 2, 4, 1}.
It needn’t keep the rank (or S-value) field. We can reuse the binary tree definition
for Skew heap. The tree is either empty, or in a pre-order form (k, L, R). Below Haskell
code defines Skew heap like this.
data SHeap a = E −− Empty
| Node a (SHeap a) (SHeap a) −− element, left, right
Merge
The merge algorithm tends to be very simple. When merge two non-empty Skew trees, we
compare the roots, and pick the smaller one as the new root, then the other tree contains
the bigger element is merged onto one sub tree, finally, the tow children are swapped.
Denote H1 = (k1 , L1 , R1 ) and H2 = (k2 , L2 , R2 ) if they are not empty. if k1 < k2 for
instance, select k1 as the new root. We can either merge H2 to L1 , or merge H2 to R1 .
Without loss of generality, let’s merge to R1 . And after swapping the two children, the
final result is (k1 , merge(R1 , H2 ), L1 ). Take account of edge cases, the merge algorithm
is defined as the following.
H1 : H2 = ϕ
H2 : H1 = ϕ
merge(H1 , H2 ) = (8.12)
(k1 , merge(R1 , H2 ), L1 ) : k1 < k2
(k2 , merge(H1 , R2 ), L2 ) : otherwise
All the rest operations, including insert, top and pop are all realized as same as the
Leftist heap by using merge, except that we needn’t the rank any more.
Translating the above algorithm into Haskell yields the following example program.
merge E h = h
merge h E = h
merge h1@(Node x l r) h2@(Node y l' r') =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r') l'
findMin (Node x _ _) = x
172 CHAPTER 8. BINARY HEAPS
Different from the Leftist heap, if we feed ordered list to Skew heap, it can build a
fairly balanced binary tree as illustrated in figure 8.12.
4 3
7 9 14 8
16 10
Figure 8.12: Skew tree is still balanced even the input is an ordered list {1, 2, ..., 10}.
8.4.1 Definition
Splay tree uses cache-like approach. It keeps rotating the current access node close to the
top, so that the node can be accessed fast next time. It defines such kinds of operation
as “Splay”. For the unbalanced binary search tree, after several splay operations, the tree
tends to be more and more balanced. Most basic operations of Splay tree perform in
amortized O(lg n) time. Splay tree was invented by Daniel Dominic Sleator and Robert
Endre Tarjan in 1985[48] [49].
Splaying
There are two methods to do splaying. The first one need deal with many different
cases, but can be implemented fairly easy with pattern matching. The second one has a
uniformed form, but the implementation is complex.
8.4. SPLAY HEAP 173
Denote the node currently being accessed as X, the parent node as P , and the grand
parent node as G (If there are). There are 3 steps for splaying. Each step contains 2
symmetric cases. For illustration purpose, only one case is shown for each step.
• Zig-zig step. As shown in figure 8.13, in this case, X and P are children on the
same side of G, either both on left or right. By rotating 2 times, X becomes the
new root.
G X
P d a p
X c b g
a b c d
(a) X and P are both left children or both right (b) X becomes new root after rotating 2 times.
children.
• Zig-zag step. As shown in figure 8.14, in this case, X and P are children on different
sides. X is on the left, P is on the right. Or X is on the right, P is on the left.
After rotation, X becomes the new root, P and G are siblings.
• Zig step. As shown in figure 8.15, in this case, P is the root, we rotate the tree, so
that X becomes new root. This is the last step in splay operation.
Although there are 6 different cases, they can be handled in the environments support
pattern matching. Denote the non-empty binary tree in form T = (L, k, R),. when access
key Y in tree T , the splay operation can be defined as below.
(a, X, (b, P, (c, G, d))) : T = (((a, X, b), P, c), G, d), X = Y
(((a, G, b), P, c), X, d) : T = (a, G, (b, P, (c, X, d))), X = Y
((a, P, b), X, (c, G, d)) : T = (a, P, (b, X, c), G, d), X = Y
splay(T, X) = ((a, G, b), X, (c, P, d)) : T = (a, G, ((b, X, c), P, d)), X = Y (8.13)
(a, X, (b, P, c)) : T = ((a, X, b), P, c), X = Y
((a, P, b), X, c) : T = (a, P, (b, X, c)), X = Y
T : otherwise
The first two clauses handle the ’zig-zig’ cases; the next two clauses handle the ’zig-
zag’ cases; the last two clauses handle the zig cases. The tree aren’t changed for all other
situations.
The following Haskell program implements this splay function.
data STree a = E −− Empty
| Node (STree a) a (STree a) −− left, key, right
−− zig-zig
splay t@(Node (Node (Node a x b) p c) g d) y =
if x == y then Node a x (Node b p (Node c g d)) else t
174 CHAPTER 8. BINARY HEAPS
P d
a X
b c
P G
a b c d
P X
X c a P
a b b c
With splay operation defined, every time when insert a new key, we call the splay
function to adjust the tree. If the tree is empty, the result is a leaf; otherwise we compare
this key with the root, if it is less than the root, we recursively insert it into the left child,
and perform splaying after that; else the key is inserted into the right child.
(ϕ, x, ϕ) : T = ϕ
insert(T, x) = splay((insert(L, x), k, R), x) : T = (L, k, R), x < k (8.14)
splay(L, k, insert(R, x)) : otherwise
Figure 8.16 shows the result of using this function. It inserts the ordered elements
{1, 2, ..., 10} one by one to the empty tree. This would build a very poor result which
downgrade to linked-list with normal binary search tree. The splay method creates more
balanced result.
4 10
2 9
1 3 7
6 8
Okasaki found a simple rule for Splaying [3]. Whenever we follow two left branches,
or two right branches continuously, we rotate the two nodes.
Based on this rule, splaying can be realized in such a way. When we access node for
a key x (can be during the process of inserting a node, or looking up a node, or deleting
a node), if we traverse two left branches or two right branches, we partition the tree in
two parts L and R, where L contains all nodes smaller than x, and R contains all the
176 CHAPTER 8. BINARY HEAPS
rest. We can then create a new tree (for instance in insertion), with x as the root, L as
the left child, and R being the right child. The partition process is recursive, because it
will splay its children as well.
(ϕ, ϕ) : T = ϕ
(T, ϕ) : T = (L, k, R) ∧ R = ϕ
′ ′ ′
((L, k, L′ ), k ′ , A, B) : T = (L,′k, (L , k , R ))
k < p, k < p
(A, B) = partition(R′ , p)
T = (L, K, (L′ , k ′ , R′ ))
′
((L, k, A), (B, k , R )) : ′
k < p ≤ k′
(A, B) = partition(L′ , p)
partition(T, p) = (8.15)
(ϕ, T ) : T = (L, k, R) ∧ L = ϕ
T = ((L′ , k ′ , R′ ), k, R)
(A, (L′ , k ′ , (R′ , k, R)) :
p ≤ k, p ≤ k ′
(A, B) = partition(L′ , p)
T = ((L′ , k ′ , R′ ), k, R)
((L′ , k ′ , A), (B, k, R)) :
k′ ≤ p ≤ k
(A, B) = partition(R′ , p)
Function partition(T, p) takes a tree T , and a pivot p as arguments. The first clause is
edge case. The partition result for empty is a pair of empty left and right trees. Otherwise,
denote the tree as (L, k, R). we need compare the pivot p and the root k. If k < p, there
are two sub-cases. one is trivial case that R is empty. According to the property of binary
search tree, All elements are less than p, so the result pair is (T, ϕ); For the other case,
R = (L′ , k ′ , R′ ), we need further compare k ′ with the pivot p. If k ′ < p is also true, we
recursively partition R′ with the pivot, all the elements less than p in R′ is held in tree
A, and the rest is in tree B. The result pair can be composed with two trees, one is
((L, k, L′ ), k ′ , A); the other is B. If the key of the right sub tree is not less than the pivot,
we recursively partition L′ with the pivot to give the intermediate pair (A, B), the final
pair trees can be composed with (L, k, A) and (B, k ′ , R′ ). There are symmetric cases for
p ≤ k. They are handled in the last three clauses.
Translating the above algorithm into Haskell yields the following partition program.
partition E _ = (E, E)
partition t@(Node l x r) y
| x < y =
case r of
E → (t, E)
Node l' x' r' →
if x' < y then
let (small, big) = partition r' y in
(Node (Node l x l') x' small, big)
else
let (small, big) = partition l' y in
(Node l x small, Node big x' r')
| otherwise =
case l of
E → (E, t)
Node l' x' r' →
if y < x' then
let (small, big) = partition l' y in
8.4. SPLAY HEAP 177
Alternatively, insertion can be realized with partition algorithm. When insert a new
element k into the splay heap T , we can first partition the heap into two trees, L and R.
Where L contains all nodes smaller than k, and R contains the rest. We then construct
a new node, with k as the root and L, R as the children.
insert(T, k) = (L, k, R), (L, R) = partition(T, k) (8.16)
The corresponding Haskell example program is as the following.
insert t x = Node small x big where (small, big) = partition t x
deleteMin (Node E x r) = r
deleteMin (Node (Node E x' r') x r) = Node r' x r
deleteMin (Node (Node l' x' r') x r) = Node (deleteMin l') x' (Node r' x r)
Merge
Merge is another basic operation for heaps as it is widely used in Graph algorithms. By
using the partition algorithm, merge can be realized in O(lg n) time.
When merging two splay trees, for non-trivial case, we can take the root of the first
tree as the new root, then partition the second tree with this new root as the pivot.
After that we recursively merge the children of the first tree to the partition result. This
algorithm is defined as the following.
{
T2 : T1 = ϕ
merge(T1 , T2 ) =
(merge(L, A), k, merge(R, B)) : T1 = (L, k, R), (A, B) = partition(T2 , k)
(8.19)
178 CHAPTER 8. BINARY HEAPS
If the first heap is empty, the result is definitely the second heap. Otherwise, denote
the first splay heap as (L, k, R), we partition T2 with k as the pivot to yield (A, B), where
A contains all the elements in T2 which are less than k, and B holds the rest. We next
recursively merge A with L; and merge B with R as the new children for T1 .
Translating the definition to Haskell gives the following example program.
merge E t = t
merge (Node l x r) t = Node (merge l l') x (merge r r')
where (l', r') = partition t x
Exercise 8.2
• Realize the imperative Leftist heap, Skew heap, and Splay heap.
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[2] Heap (data structure), Wikipedia. [Link]
[3] Heapsort, Wikipedia. [Link]
[4] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[5] Sorting algorithms/Heapsort. Rosetta Code. [Link]
[6] Leftist Tree, Wikipedia. [Link]
[7] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented Design Pat-
terns in Java. [Link]
[8] Donald E. Knuth. “The Art of Computer Programming. Volume 3: Sorting and
Searching.”. Addison-Wesley Professional; 2nd Edition (October 15, 1998). ISBN-13:
978-0201485417. Section 5.2.3 and 6.2.3
[12] Sleator, Daniel D.; Tarjan, Robert E. (1985), “Self-Adjusting Binary Search Trees”,
Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835
[13] NIST, “binary heap”. [Link]
179
180 The evolution of selection sort
Chapter 9
9.1 Introduction
We have introduced the ‘hello world’ sorting algorithm, insertion sort. In this short
chapter, we explain another straightforward sorting method, selection sort. The basic
version of selection sort doesn’t perform as good as the divide and conqueror methods,
e.g. quick sort and merge sort. We’ll use the same approaches in the chapter of insertion
sort, to analyze why it’s slow, and try to improve it by varies of attempts till reach the
best bound of comparison based sorting, O(n lg n), by evolving to heap sort.
The idea of selection sort can be illustrated by a real life story. Consider a kid eating
a bunch of grapes. There are two types of children according to my observation. One is
optimistic type, that the kid always eats the biggest grape he/she can ever find; the other
is pessimistic, that he/she always eats the smallest one.
The first type of kids actually eat the grape in an order that the size decreases mono-
tonically; while the other eat in a increase order. The kid sorts the grapes in order of size
in fact, and the method used here is selection sort.
Based on this idea, the algorithm of selection sort can be directly described as the
following.
In order to sort a series of elements:
181
182CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR
• The trivial case, if the series is empty, then we are done, the result is also empty;
• Otherwise, we find the smallest element, and append it to the tail of the result;
Note that this algorithm sorts the elements in increase order; It’s easy to sort in
decrease order by picking the biggest element instead; We’ll introduce about passing a
comparator as a parameter later on.
This description can be formalized to a equation.
{
ϕ : A=ϕ
sort(A) = (9.1)
{m} ∪ sort(A′ ) : otherwise
Where m is the minimum element among collection A, and A′ is all the rest elements
except m:
m = min(A)
A′ = A − {m}
We don’t limit the data structure of the collection here. Typically, A is an array in
imperative environment, and a list (singly linked-list particularly) in functional environ-
ment, and it can even be other data struture which will be introduced later.
The algorithm can also be given in imperative manner.
function Sort(A)
X←ϕ
while A 6= ϕ do
x ← Min(A)
A ← Del(A, x)
X ← Append(X, x)
return X
Figure 9.2 depicts the process of this algorithm.
pick
Figure 9.2: The left part is sorted data, continuously pick the minimum element in the
rest and append it to the result.
We just translate the very original idea of ‘eating grapes’ line by line without consid-
ering any expense of time and space. This realization stores the result in X, and when an
selected element is appended to X, we delete the same element from A. This indicates
that we can change it to ‘in-place’ sorting to reuse the spaces in A.
The idea is to store the minimum element in the first cell in A (we use term ‘cell’ if A
is an array, and say ‘node’ if A is a list); then store the second minimum element in the
next cell, then the third cell, ...
One solution to realize this sorting strategy is swapping. When we select the i-th
minimum element, we swap it with the element in the i-th cell:
function Sort(A)
for i ← 1 to |A| do
m ← Min(A[i...])
Exchange A[i] ↔ m
9.2. FINDING THE MINIMUM 183
Denote A = {a1 , a2 , ..., an }. At any time, when we process the i-th element, all
elements before i, as {a1 , a2 , ..., ai−1 } have already been sorted. We locate the minimum
element among the {ai , ai+1 , ..., an }, and exchange it with ai , so that the i-th cell contains
the right value. The process is repeatedly executed until we arrived at the last element.
This idea can be illustrated by figure 9.3.
swap
Figure 9.3: The left part is sorted data, continuously pick the minimum element in the
rest and put it to the right position.
9.2.1 Labeling
Method 1 is to label each grape with a number: {1, 2, ..., n}, and we systematically perform
the comparison in the order of this sequence of labels. That we first compare grape number
1 and grape number 2, pick the bigger one; then we take grape number 3, and do the
comparison, ... We repeat this process until arrive at grape number n. This is quite
suitable for elements stored in an array.
function Min(A)
m ← A[1]
for i ← 2 to |A| do
if A[i] < m then
m ← A[i]
return m
With Min defined, we can complete the basic version of selection sort (or naive version
without any optimization in terms of time and space).
However, this algorithm returns the value of the minimum element instead of its
location (or the label of the grape), which needs a bit tweaking for the in-place version.
Some languages such as ISO C++, support returning the reference as result, so that the
swap can be achieved directly as below.
184CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR
template<typename T>
T& min(T∗ from, T∗ to) {
T∗ m;
for (m = from++; from ̸= to; ++from)
if (∗from < ∗m)
m = from;
return ∗m;
}
template<typename T>
void ssort(T∗ xs, int n) {
for (int i = 0; i < n; ++i)
std::swap(xs[i], min(xs+i, xs+n));
}
9.2.2 Grouping
Another method is to group all grapes in two parts: the group we have examined, and
the rest we haven’t. We denote these two groups as A and B; All the elements (grapes)
as L. At the beginning, we haven’t examine any grapes at all, thus A is empty (ϕ),
and B contains all grapes. We can select arbitrary two grapes from B, compare them,
and put the loser (the smaller one for example) to A. After that, we repeat this process
by continuously picking arbitrary grapes from B, and compare with the winner of the
previous time until B becomes empty. At this time being, the final winner is the minimum
element. And A turns to be L − {min(L)}, which can be used for the next time minimum
finding.
There is an invariant of this method, that at any time, we have L = A ∪ {m} ∪ B,
where m is the winner so far we hold.
9.2. FINDING THE MINIMUM 185
This approach doesn’t need the collection of grapes being indexed (as being labeled
in method 1). It’s suitable for any traversable data structures, including linked-list etc.
Suppose b1 is an arbitrary element in B if B isn’t empty, and B ′ is the rest of elements
with b1 being removed, this method can be formalized as the below auxiliary function.
(m, A) : B = ϕ
min′ (A, m, B) = min′ (A ∪ {m}, b1 , B ′ ) : b1 < m (9.2)
min′ (A ∪ {b1 }, m, B ′ ) : otherwise
In order to pick the minimum element, we call this auxiliary function by passing an
empty A, and use an arbitrary element (for instance, the first one) to initialize m:
Where L′ is all elements in L except for the first one l1 . The algorithm extractM in
doesn’t not only find the minimum element, but also returns the updated collection which
doesn’t contain this minimum. Summarize this minimum extracting algorithm up to the
basic selection sort definition, we can create a complete functional sorting program, for
example as this Haskell code snippet.
sort [] = []
sort xs = x : sort xs' where
(x, xs') = extractMin xs
The first line handles the trivial edge case that the sorting result for empty list is
obvious empty; The second clause ensures that, there is at least one element, that’s why
the extractMin function needn’t other pattern-matching.
One may think the second clause of min' function should be written like below:
min' ys m (x:xs) = if m < x then min' ys ++ [x] m xs
else min' ys ++ [m] x xs
Or it will produce the updated list in reverse order. Actually, it’s necessary to use
‘cons’ instead of appending here. This is because appending is linear operation which is
proportion to the length of part A, while ‘cons’ is constant O(1) time operation. In fact,
we needn’t keep the relative order of the list to be sorted, as it will be re-arranged anyway
during sorting.
It’s quite possible to keep the relative order during sorting1 , while ensure the perfor-
mance of finding the minimum element not degrade to quadratic. The following equation
defines a solution.
(l1 , ϕ) : |L| = 1
extractM in(L) = (l1 , L′ ) : l1 < m, (m, L′′ ) = extractM in(L′ ) (9.4)
(m, l1 ∪ L′′ ) : otherwise
sort [] = []
sort xs = x : sort xs' where
(x, xs') = extractMin xs
Note that only ‘cons’ operation is used, we needn’t appending at all because the
algorithm actually examines the list from right to left. However, it’s not free, as this
program need book-keeping the context (via call stack typically). The relative order is
ensured by the nature of recursion. Please refer to the appendix about tail recursion call
for detailed discussion.
Exercise 9.1
• Implement the basic imperative selection sort algorithm (the none in-place version)
in your favorite programming language. Compare it with the in-place version, and
analyze the time and space effectiveness.
Where c is a comparator function, it takes two elements, compare them and returns
the result of which one is preceding of the other. Passing ‘less than’ operator (<) turns
this algorithm to be the version we introduced in previous section.
Some environments require to pass the total ordering comparator, which returns result
among ‘less than’, ’equal’, and ’greater than’. We needn’t such strong condition here,
that c only tests if ‘less than’ is satisfied. However, as the minimum requirement, the
comparator should meet the strict weak ordering as following [52]:
• Asymmetric, For all x and y, if x < y, then it’s not the case y < x;
The following Scheme/Lisp program translates this generic selection sorting algorithm.
The reason why we choose Scheme/Lisp here is because the lexical scope can simplify the
needs to pass the ‘less than’ comparator for every function calls.
(define (sel-sort-by ltp? lst)
(define (ssort lst)
(if (null? lst)
lst
(let ((p (extract-min lst)))
(cons (car p) (ssort (cdr p))))))
(define (extract-min lst)
(if (null? (cdr lst))
lst
(let ((p (extract-min (cdr lst))))
(if (ltp? (car lst) (car p))
lst
(cons (car p) (cons (car lst) (cdr p)))))))
(ssort lst))
Note that, both ssort and extract-min are inner functions, so that the ‘less than’
comparator ltp? is available to them. Passing ‘<’ to this function yields the normal
sorting in ascending order:
(sel-sort-by < '(3 1 2 4 5 10 9))
; Value 16: (1 2 3 4 5 9 10)
It’s possible to pass varies of comparator to imperative selection sort as well. This is
left as an exercise to the reader.
For the sake of brevity, we only consider sorting elements in ascending order in the
rest of this chapter. And we’ll not pass comparator as a parameter unless it’s necessary.
Observe that, when we are sorting n elements, after the first n − 1 minimum ones are
selected, the left only one, is definitely the n-th big element, so that we need NOT find
the minimum if there is only one element in the list. This indicates that the outer loop
can iterate to n − 1 instead of n.
Another place we can fine tune, is that we needn’t swap the elements if the i-th
minimum one is just A[i]. The algorithm can be modified accordingly as below:
procedure Sort(A)
for i ← 1 to |A| − 1 do
m←i
for j ← i + 1 to |A| do
if A[i] < A[m] then
m←i
if m 6= i then
Exchange A[i] ↔ A[m]
Definitely, these modifications won’t affects the performance in terms of big-O.
swap
Figure 9.4: Select the maximum every time and put it to the end.
This version reveals the fact that, selecting the maximum element can sort the element
in ascending order as well. What’s more, we can find both the minimum and the maximum
elements in one pass of traversing, putting the minimum at the first location, while putting
the maximum at the last position. This approach can speed up the sorting slightly (halve
the times of the outer loop). This method is called ’cock-tail sort’.
procedure Sort(A)
for i ← 1 to b |A|
2 c do
min ← i
max ← |A| + 1 − i
if A[max] < A[min] then
Exchange A[min] ↔ A[max]
9.3. MINOR IMPROVEMENT 189
for j ← i + 1 to |A| − i do
if A[j] < A[min] then
min ← j
if A[max] < A[j] then
max ← j
Exchange A[i] ↔ A[min]
Exchange A[|A| + 1 − i] ↔ A[max]
This algorithm can be illustrated as in figure 9.5, at any time, the left most and right
most parts contain sorted elements so far. That the smaller sorted ones are on the left,
while the bigger sorted ones are on the right. The algorithm scans the unsorted ranges,
located both the minimum and the maximum positions, then put them to the head and
the tail position of the unsorted ranges by swapping.
swap
... sorted small ones ... x ... max ... min ... y ... sorted big ones ...
Figure 9.5: Select both the minimum and maximum in one pass, and put them to the
proper positions.
Note that it’s necessary to swap the left most and right most elements before the
inner loop if they are not in correct order. This is because we scan the range excluding
these two elements. Another method is to initialize the first element of the unsorted
range as both the maximum and minimum before the inner loop. However, since we
need two swapping operations after the scan, it’s possible that the first swapping moves
the maximum or the minimum from the position we just found, which leads the second
swapping malfunctioned. How to solve this problem is left as exercise to the reader.
The following Python example program implements this cock-tail sort algorithm.
def cocktail_sort(xs):
n = len(xs)
for i in range(n / 2):
(mi, ma) = (i, n - 1 -i)
if xs[ma] < xs[mi]:
(xs[mi], xs[ma]) = (xs[ma], xs[mi])
for j in range(i+1, n - 1 - i):
if xs[j] < xs[mi]:
mi = j
if xs[ma] < xs[j]:
ma = j
(xs[i], xs[mi]) = (xs[mi], xs[i])
(xs[n - 1 - i], xs[ma]) = (xs[ma], xs[n - 1 - i])
return xs
• Trivial edge case: If the list is empty, or there is only one element in the list, the
sorted result is obviously the origin list;
• Otherwise, we select the minimum and the maximum, put them in the head and
tail positions, then recursively sort the rest elements.
190CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR
Where the minimum and the maximum are extracted from L by a function select(L).
Note that, the minimum is actually linked to the front of the recursive sort result. Its
semantic is a constant O(1) time ‘cons’ (refer to the appendix of this book for detail).
While the maximum is appending to the tail. This is typically a linear O(n) time expensive
operation. We’ll optimize it later.
Function select(L) scans the whole list to find both the minimum and the maximum.
It can be defined as below:
(min(l1 , l2 ), max(l1 , l2 )) : L = {l1 , l2 }
(l1 , {lmin } ∪ L′′ , lmax ) : l1 < lmin
select(L) = (9.8)
(lmin , {lmax } ∪ L′′ , l1 ) : lmax < l1
′′
(lmin , {l1 } ∪ L , lmax ) : otherwise
Where (lmin , L′′ , lmax ) = select(L′ ) and L′ is the rest of the list except for the first
element l1 . If there are only two elements in the list, we pick the smaller as the minimum,
and the bigger as the maximum. After extract them, the list becomes empty. This is the
trivial edge case; Otherwise, we take the first element l1 out, then recursively perform
selection on the rest of the list. After that, we compare if l1 is less then the minimum or
greater than the maximum candidates, so that we can finalize the result.
Note that for all the cases, there is no appending operation to form the result. However,
since selection must scan all the element to determine the minimum and the maximum,
it is bound to O(n) linear time.
The complete example Haskell program is given as the following.
csort [] = []
csort [x] = [x]
csort xs = mi : csort xs' ++ [ma] where
(mi, xs', ma) = extractMinMax xs
Where lmin , lmax and L′′ are defined as same as before. And we start sorting by
passing empty A and B: sort(L) = sort′ (ϕ, L, ϕ).
Besides the edge case, observing that the appending operation only happens on A ∪
{lmin }; while lmax is only linked to the head of B. This appending occurs in every
←
−
recursive call. To eliminate it, we can store A in reverse order as A , so that lmax can be
9.4. MAJOR IMPROVEMENT 191
‘cons’ to the head instead of appending. Denote cons(x, L) = {x} ∪ L and append(L, x) =
L ∪ {x}, we have the below equation.
Exercise 9.2
• Realize the imperative basic selection sort algorithm, which can take a comparator
as a parameter. Please try both dynamic typed language and static typed language.
How to annotate the type of the comparator as general as possible in a static typed
language?
• Implement Knuth’s version of selection sort in your favorite programming language.
• An alternative to realize cock-tail sort is to assume the i-th element both the min-
imum and the maximum, after the inner loop, the minimum and maximum are
found, then we can swap the the minimum to the i-th position, and the maximum
to position |A| + 1 − i. Implement this solution in your favorite imperative language.
Please note that there are several special edge cases should be handled correctly:
– A = {max, min, ...};
– A = {..., max, min};
– A = {max, ..., min}.
Please don’t refer to the example source code along with this chapter before you
try to solve this problem.
• Realize the function select(L) by folding.
elements every time to select the minimum? Note that when we pick the smallest one at
the first time, we actually traverse the whole collection, so that we know which ones are
relative big, and which ones are relative small partially.
The problem is that, when we select the further minimum elements, instead of re-using
the ordering information we obtained previously, we drop them all, and blindly start a
new traverse.
So the key point to improve selection based sort is to re-use the previous result. There
are several approaches, we’ll adopt an intuitive idea inspired by football match in this
chapter.
16
16 14
16 13 10 14
7 16 8 13 10 9 12 14
7 6 15 16 8 4 13 3 5 10 9 1 12 2 11 14
Imagine that every team has a number. The bigger the number, the stronger the team.
Suppose that the stronger team always beats the team with smaller number, although
this is not true in real world. But this simplification is fair enough for us to develop the
tournament knock out solution. This maximum number which represents the champion
is 16. Definitely, team with number 14 isn’t the second best according to our rules. It
should be 15, which is knocked out at the first round of comparison.
The key question here is to find an effective way to locate the second maximum number
in this tournament tree. After that, what we need is to apply the same method to select
the third, the fourth, ..., to accomplish the selection based sort.
9.4. MAJOR IMPROVEMENT 193
One idea is to assign the champion a very small number (for instance, −∞), so that
it won’t be selected next time, and the second best one, becomes the new champion.
However, suppose there are 2m teams for some natural number m, it still takes 2m−1 +
2m−2 + ... + 2 + 1 = 2m times of comparison to determine the new champion. Which is
as slow as the first time.
Actually, we needn’t perform a bottom-up comparison at all since the tournament
tree stores plenty of ordering information. Observe that, the second best team must be
beaten by the champion at sometime, or it will be the final winner. So we can track the
path from the root of the tournament tree to the leaf of the champion, examine all the
teams along with this path to find the second best team.
In figure 9.6, this path is marked in gray color, the elements to be examined are
{14, 13, 7, 15}. Based on this idea, we refine the algorithm like below.
1. Build a tournament tree from the elements to be sorted, so that the champion (the
maximum) becomes the root;
2. Extract the root from the tree, perform a top-down pass and replace the maximum
with −∞;
3. Perform a bottom-up back-track along the path, determine the new champion and
make it as the new root;
Figure 9.7, 9.8, and 9.9 show the steps of applying this strategy.
15
15 14
15 13 10 14
7 15 8 13 10 9 12 14
7 6 15 -INF 8 4 13 3 5 10 9 1 12 2 11 14
14
13 14
7 13 10 14
7 -INF 8 13 10 9 12 14
7 6 -INF -INF 8 4 13 3 5 10 9 1 12 2 11 14
We can reuse the binary tree definition given in the first chapter of this book to
represent tournament tree. In order to back-track from leaf to the root, every node
should hold a reference to its parent (concept of pointer in some environment such as
ANSI C):
194CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR
13
13 12
7 13 10 12
7 -INF 8 13 10 9 12 11
struct Node {
Key key;
struct Node ∗left, ∗right, ∗parent;
};
To build a tournament tree from a list of elements (suppose the number of elements
are 2m for some m), we can first wrap each element as a leaf, so that we obtain a list of
binary trees. We take every two trees from this list, compare their keys, and form a new
binary tree with the bigger key as the root; the two trees are set as the left and right
children of this new binary tree. Repeat this operation to build a new list of trees. The
height of each tree is increased by 1. Note that the size of the tree list halves after such a
pass, so that we can keep reducing the list until there is only one tree left. And this tree
is the finally built tournament tree.
function Build-Tree(A)
T ←ϕ
for each x ∈ A do
t ← Create-Node
Key(t) ← x
Append(T, t)
while |T | > 1 do
T′ ← ϕ
for every t1 , t2 ∈ T do
t ← Create-Node
Key(t) ← Max(Key(t1 ), Key(t2 ))
Left(t) ← t1
Right(t) ← t2
Parent(t1 ) ← t
Parent(t2 ) ← t
Append(T ′ , t)
T ← T′
return T [1]
Suppose the length of the list A is n, this algorithm firstly traverses the list to build
tree, which is linear to n time. Then it repeatedly compares pairs, which loops proportion
to n + n2 + n4 + ... + 2 = 2n. So the total performance is bound to O(n) time.
The following ANSI C program implements this tournament tree building algorithm.
struct Node∗ build(const Key∗ xs, int n) {
int i;
struct Node ∗t, ∗∗ts = (struct Node∗∗) malloc(sizeof(struct Node∗) ∗ n);
for (i = 0; i < n; ++i)
ts[i] = leaf(xs[i]);
9.4. MAJOR IMPROVEMENT 195
for (; n > 1; n /= 2)
for (i = 0; i < n; i += 2)
ts[i/2] = branch(max(ts[i]→key, ts[i+1]→key), ts[i], ts[i+1]);
t = ts[0];
free(ts);
return t;
}
Function leaf(x) creats a leaf node, with value x as key, and sets all its fields,
left, right and parent to NIL. While function branch(key, left, right) creates a
branch node, and links the new created node as parent of its two children if they are not
empty. For the sake of brevity, we skip the detail of them. They are left as exercise to
the reader, and the complete program can be downloaded along with this book.
Some programming environments, such as Python provides tool to iterate every two
elements at a time, for example:
for x, y in zip(∗[iter(ts)]∗2):
We skip such language specific feature, readers can refer to the Python example pro-
gram along with this book for details.
When the maximum element is extracted from the tournament tree, we replace it with
−∞, and repeatedly replace all these values from the root to the leaf. Next, we back-track
to root through the parent field, and determine the new maximum element.
function Extract-Max(T )
m ← Key(T )
Key(T ) ← −∞
while ¬ Leaf?(T ) do ▷ The top down pass
if Key(Left(T )) = m then
T ← Left(T )
else
T ← Right(T )
Key(T ) ← −∞
while Parent(T ) 6= ϕ do ▷ The bottom up pass
T ← Parent(T )
Key(T ) ← Max(Key(Left(T )), Key(Right(T )))
return m
This algorithm returns the extracted maximum element, and modifies the tournament
tree in-place. Because we can’t represent −∞ in real program by limited length of word,
one approach is to define a relative negative big number, which is less than all the elements
in the tournament tree, for example, suppose all the elements are greater than -65535, we
can define negative infinity as below:
#define N_INF -65535
while (t→parent) {
t = t→parent;
t→key = max(t→left→key, t→right→key);
}
return x;
}
The behavior of Extract-Max is quite similar to the pop operation for some data
structures, such as queue, and heap, thus we name it as pop in this code snippet.
Algorithm Extract-Max process the tree in two passes, one is top-down, then a
bottom-up along the path that the ‘champion team wins the world cup’. Because the
tournament tree is well balanced, the length of this path, which is the height of the tree,
is bound to O(lg n), where n is the number of the elements to be sorted (which are equal
to the number of leaves). Thus the performance of this algorithm is O(lg n).
It’s possible to realize the tournament knock out sort now. We build a tournament
tree from the elements to be sorted, then continuously extract the maximum. If we want
to sort in monotonically increase order, we put the first extracted one to the right most,
then insert the further extracted elements one by one to left; Otherwise if we want to sort
in decrease order, we can just append the extracted elements to the result. Below is the
algorithm sorts elements in ascending order.
procedure Sort(A)
T ← Build-Tree(A)
for i ← |A| down to 1 do
A[i] ← Extract-Max(T )
Translating it to ANSI C example program is straightforward.
void tsort(Key∗ xs, int n) {
struct Node∗ t = build(xs, n);
while(n)
xs[--n] = pop(t);
release(t);
}
This algorithm firstly takes O(n) time to build the tournament tree, then performs n
pops to select the maximum elements so far left in the tree. Since each pop operation is
bound to O(lg n), thus the total performance of tournament knock out sorting is O(n lg n).
Thus a binary tree is either empty or a branch node contains a key, a left sub tree and
a right sub tree. Both children are again binary trees.
We’ve use hard coded big negative number to represents −∞. However, this solution
is ad-hoc, and it forces all elements to be sorted are greater than this pre-defined magic
number. Some programming environments support algebraic type, so that we can define
negative infinity explicitly. For instance, the below Haskell program setups the concept
of infinity 2 .
2 The order of the definition of ‘NegInf’, regular number, and ‘Inf’ is significant if we want to derive
9.4. MAJOR IMPROVEMENT 197
From now on, we switch back to use the min() function to determine the winner, so
that the tournament selects the minimum instead of the maximum as the champion.
Denote function key(T ) returns the key of the tree rooted at T . Function wrap(x)
wraps the element x into a leaf node. Function tree(l, k, r) creates a branch node, with k
as the key, l and r as the two children respectively.
The knock out process, can be represented as comparing two trees, picking the smaller
key as the new key, and setting these two trees as children:
branch(T1 , T2 ) = tree(T1 , min(key(T1 ), key(T2 )), T2 ) (9.12)
This can be implemented in Haskell word by word:
branch t1 t2 = Br t1 (min (key t1) (key t2)) t2
There is limitation in our tournament sorting algorithm so far. It only accepts collec-
tion of elements with size of 2m , or we can’t build a complete binary tree. This can be
actually solved in the tree building process. Remind that we pick two trees every time,
compare and pick the winner. This is perfect if there are always even number of trees.
Considering a case in football match, that one team is absent for some reason (sever flight
delay or whatever), so that there left one team without a challenger. One option is to
make this team the winner, so that it will attend the further games. Actually, we can use
the similar approach.
To build the tournament tree from a list of elements, we wrap every element into a
leaf, then start the building process.
build(L) = build′ ({wrap(x)|x ∈ L}) (9.13)
The build′ (T) function terminates when there is only one tree left in T, which is the
champion. This is the trivial edge case. Otherwise, it groups every two trees in a pair to
determine the winners. When there are odd numbers of trees, it just makes the last tree
as the winner to attend the next level of tournament and recursively repeats the building
process.
{
T : |T| ≤ 1
build′ (T) = (9.14)
build′ (pair(T)) : otherwise
Note that this algorithm actually handles another special cases, that the list to be
sort is empty. The result is obviously empty.
Denote T = {T1 , T2 , ...} if there are at least two trees, and T′ represents the left trees
by removing the first two. Function pair(T) is defined as the following.
{
{branch(T1 , T2 )} ∪ pair(T′ ) : |T| ≥ 2
pair(T) = (9.15)
T : otherwise
The complete tournament tree building algorithm can be implemented as the below
example Haskell program.
fromList :: (Ord a) ⇒ [a] → Tr (Infinite a)
fromList = build ◦ (map wrap) where
build [] = Empty
build [t] = t
build ts = build $ pair ts
pair (t1:t2:ts) = (branch t1 t2):pair ts
pair ts = ts
the default, correct comparing behavior of ‘Ord’. Anyway, it’s possible to specify the detailed order by
make it as an instance of ‘Ord’. However, this is Language specific feature which is out of the scope of
this book. Please refer to other textbook about Haskell.
198CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR
When extracting the champion (the minimum) from the tournament tree, we need
examine either the left child sub-tree or the right one has the same key as the root, and
recursively extract on that tree until arrive at the leaf node. Denote the left sub-tree of
T as L, right sub-tree as R, and K as its key. We can define this popping algorithm as
the following.
tree(ϕ, ∞, ϕ) : L = ϕ ∧ R = ϕ
pop(T ) = tree(L′ , min(key(L′ ), key(R)), R) : K = key(L), L′ = pop(L) (9.16)
tree(L, min(key(L), key(R′ )), R′ ) : K = key(R), R′ = pop(R)
It’s straightforward to translate this algorithm into example Haskell code.
pop (Br Empty _ Empty) = Br Empty Inf Empty
pop (Br l k r) | k == key l = let l' = pop l in Br l' (min (key l') (key r)) r
| k == key r = let r' = pop r in Br l (min (key l) (key r')) r'
Note that this algorithm only removes the current champion without returning it. So
it’s necessary to define a function to get the champion at the root node.
top(T ) = key(T ) (9.17)
With these functions defined, tournament knock out sorting can be formalized by
using them.
sort(L) = sort′ (build(L)) (9.18)
Where sort′ (T ) continuously pops the minimum element to form a result tree
{
ϕ : T = ϕ ∨ key(T ) = ∞
sort′ (T ) = (9.19)
{top(T )} ∪ sort′ (pop(T )) : otherwise
The rest of the Haskell code is given below to complete the implementation.
top = only ◦ key
And the auxiliary function only, key, wrap accomplished with explicit infinity sup-
port are list as the following.
only (Only x) = x
key (Br _ k _ ) = k
wrap x = Br Empty (Only x) Empty
Exercise 9.3
• Implement the helper function leaf(), branch, max() lsleaf(), and release()
to complete the imperative tournament tree program.
• Implement the imperative tournament tree in a programming language support GC
(garbage collection).
• Why can our tournament tree knock out sort algorithm handle duplicated elements
(elements with same value)? We say a sorting algorithm stable, if it keeps the
original order of elements with same value. Is the tournament tree knock out sorting
stable?
9.5. SHORT SUMMARY 199
• Design an imperative tournament tree knock out sort algorithm, which satisfies the
following:
This is exactly as same as the one of heap sort we gave in previous chapter. Heap
always keeps the minimum (or the maximum) on the top, and provides fast pop operation.
The binary heap by implicit array encodes the tree structure in array index, so there aren’t
any extra spaces allocated except for the n array cells. The functional heaps, such as leftist
heap and splay heap allocate n nodes as well. We’ll introduce more heaps in next chapter
which perform well in many aspects.
[1] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. ISBN:0262032937. The MIT Press. 2001
201
202 Binomial heap, Fibonacci heap, and pairing heap
Chapter 10
10.1 Introduction
In previous chapter, we mentioned that heaps can be generalized and implemented with
varies of data structures. However, only binary heaps are focused so far no matter by
explicit binary trees or implicit array.
It’s quite natural to extend the binary tree to K-ary [54] tree. In this chapter, we first
show Binomial heaps which is actually consist of forest of K-ary trees. Binomial heaps
gain the performance for all operations to O(lg n), as well as keeping the finding minimum
element to O(1) time.
If we delay some operations in Binomial heaps by using lazy strategy, it turns to be
Fibonacci heap.
All binary heaps we have shown perform no less than O(lg n) time for merging, we’ll
show it’s possible to improve it to O(1) with Fibonacci heap, which is quite helpful
to graph algorithms. Actually, Fibonacci heap achieves almost all operations to good
amortized time bound as O(1), and left the heap pop to O(lg n).
Finally, we’ll introduce about the pairing heaps. It has the best performance in prac-
tice although the proof of it is still a conjecture for the time being.
Binomial tree
In order to explain why the name of the tree is ‘binomial’, let’s review the famous Pascal’s
triangle (Also know as the Jia Xian’s triangle to memorize the Chinese methematican Jia
Xian (1010-1070).) [55].
1
1 1
203
204 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
1 2 1
1 3 3 1
1 4 6 4 1
...
In each row, the numbers are all binomial coefficients. There are many ways to gain
a series of binomial coefficient numbers. One of them is by using recursive composition.
Binomial trees, as well, can be defined in this way as the following.
• A binomial tree of rank 0 has only a node as the root;
• A binomial tree of rank n is consist of two rank n − 1 binomial trees, Among these
2 sub trees, the one has the bigger root element is linked as the leftmost child of
the other.
We denote a binomial tree of rank 0 as B0 , and the binomial tree of rank n as Bn .
Figure 10.1 shows a B0 tree and how to link 2 Bn−1 trees to a Bn tree.
(a) A B0 tree.
rank=n-1
rank=n-1 ...
...
With this recursive definition, it easy to draw the form of binomial trees of rank 0, 1,
2, ..., as shown in figure 10.2
Observing the binomial trees reveals some interesting properties. For each rank n
binomial tree, if counting the number of nodes in each row, it can be found that it is the
binomial number.
For instance for rank 4 binomial tree, there is 1 node as the root; and in the second
level next to root, there are 4 nodes; and in 3rd level, there are 6 nodes; and in 4-th level,
there are 4 nodes; and the 5-th level, there is 1 node. They are exactly 1, 4, 6, 4, 1 which
is the 5th row in Pascal’s triangle. That’s why we call it binomial tree.
Another interesting property is that the total number of node for a binomial tree with
rank n is 2n . This can be proved either by binomial theory or the recursive definition
directly.
Binomial heap
With binomial tree defined, we can introduce the definition of binomial heap. A binomial
heap is a set of binomial trees (or a forest of binomial trees) that satisfied the following
10.2. BINOMIAL HEAPS 205
2 2 1 0
1 1 0 1 0 0
0 0 0
3 2 1 0
2 1 0 1 0 0
1 0 0 0
0
...
(e) B4 tree;
properties.
• Each binomial tree in the heap conforms to heap property, that the key of a node
is equal or greater than the key of its parent. Here the heap is actually min-heap,
for max-heap, it changes to ‘equal or less than’. In this chapter, we only discuss
about min-heap, and max-heap can be equally applied by changing the comparison
condition.
• There is at most one binomial tree which has the rank r. In other words, there are
no two binomial trees have the same rank.
This definition leads to an important result that for a binomial heap contains n ele-
ments, and if convert n to binary format yields a0 , a1 , a2 , ..., am , where a0 is the LSB and
am is the MSB, then for each 0 ≤ i ≤ m, if ai = 0, there is no binomial tree of rank i and
if ai = 1, there must be a binomial tree of rank i.
For example, if a binomial heap contains 5 element, as 5 is ‘(LSB)101(MSB)’, then
there are 2 binomial trees in this heap, one tree has rank 0, the other has rank 2.
Figure 10.3 shows a binomial heap which have 19 nodes, as 19 is ‘(LSB)11001(MSB)’
in binary format, so there is a B0 tree, a B1 tree and a B4 tree.
18 3 6
37 8 29 10 44
30 23 22 48 31 17
45 32 24 50
55
Data layout
There are two ways to define K-ary trees imperatively. One is by using ‘left-child, right-
sibling’ approach[4]. It is compatible with the typical binary tree structure. For each
node, it has two fields, left field and right field. We use the left field to point to the first
child of this node, and use the right field to point to the sibling node of this node. All
siblings are represented as a single directional linked list. Figure 10.4 shows an example
tree represented in this way.
The other way is to use the library defined collection container, such as array or list
to represent all children of a node.
Since the rank of a tree plays very important role, we also defined it as a field.
For ‘left-child, right-sibling’ method, we defined the binomial tree as the following.1
1C programs are also provided along with this book.
10.2. BINOMIAL HEAPS 207
R NIL
C1 C2 ... Cn
Figure 10.4: Example tree represented in ‘left-child, right-sibling’ way. R is the root node,
it has no sibling, so it right side is pointed to N IL. C1 , C2 , ..., Cn are children of R. C1
is linked from the left side of R, other siblings of C1 are linked one next to each other on
the right side of C1 . C2′ , ..., Cm
′
are children of C1 .
class BinomialTree:
def __init__(self, x = None):
[Link] = 0
[Link] = x
[Link] = None
[Link] = None
[Link] = None
When initialize a tree with a key, we create a leaf node, set its rank as zero and all
other fields are set as NIL.
It quite nature to utilize pre-defined list to represent multiple children as below.
class BinomialTree:
def __init__(self, x = None):
[Link] = 0
[Link] = x
[Link] = None
[Link] = []
For purely functional settings, such as in Haskell language, binomial tree are defined
as the following.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}
While binomial heap are defined as a list of binomial trees (a forest) with ranks
in monotonically increase order. And as another implicit constraint, there are no two
binomial trees have the same rank.
type BiHeap a = [BiTree a]
Key(T ), Children(T ), and Rank(T ) access the key, children and rank of a binomial tree
respectively.
{
node(r + 1, x, {T2 } ∪ C1 ) : x < y
link(T1 , T2 ) = (10.1)
node(r + 1, y, {T1 } ∪ C2 ) : otherwise
Where
x = Key(T1 )
y = Key(T2 )
r = Rank(T1 ) = Rank(T2 )
C1 = Children(T1 )
C2 = Children(T2 )
y ...
...
Note that the link operation is bound to O(1) time if the ∪ is a constant time operation.
It’s easy to translate the link function to Haskell program as the following.
link t1@(Node r x c1) t2@(Node _ y c2) =
if x<y then Node (r+1) x (t2:c1)
else Node (r+1) y (t1:c2)
It’s possible to realize the link operation in imperative way. If we use ‘left child, right
sibling’ approach, we just link the tree which has the bigger key to the left side of the
other’s key, and link the children of it to the right side as sibling. Figure 10.6 shows the
result of one case.
1: function Link(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Sibling(T2 ) ← Child(T1 )
5: Child(T1 ) ← T2
6: Parent(T2 ) ← T1
7: Rank(T1 ) ← Rank(T1 ) + 1
8: return T1
And if we use a container to manage all children of a node, the algorithm is like below.
1: function Link’(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Parent(T2 ) ← T1
5: Insert-Before(Children(T1 ), T2 )
6: Rank(T1 ) ← Rank(T1 ) + 1
7: return T1
10.2. BINOMIAL HEAPS 209
y ...
...
Figure 10.6: Suppose x < y, link y to the left side of x and link the original children of x
to the right side of y.
It’s easy to translate both algorithms to real program. Here we only show the Python
program of Link’ for illustration purpose 2 .
def link(t1, t2):
if [Link] < [Link]:
(t1, t2) = (t2, t1)
[Link] = t1
[Link](0, t2)
[Link] = [Link] + 1
return t1
Exercise 10.1
Implement the tree-linking program in your favorite language with left-child, right-
sibling method.
We mentioned linking is a constant time algorithm and it is true when using left-child,
right-sibling approach, However, if we use container to manage the children, the perfor-
mance depends on the concrete implementation of the container. If it is plain array, the
linking time will be proportion to the number of children. In this chapter, we assume the
time is constant. This is true if the container is implemented in linked-list.
where
H ′ = {T2 , T3 , ..., Tn }
The idea is that for the empty heap, we set the new tree as the only element to create
a singleton forest; otherwise, we compare the ranks of the new tree and the first tree in
2 The C and C++ programs are also available along with this book
210 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
the forest, if they are same, we link them together, and recursively insert the linked result
(a tree with rank increased by one) to the rest of the forest; If they are not same, since
the pre-condition constraints the rank of the new tree, it must be the smallest, we put
this new tree in front of all the other trees in the forest.
From the binomial properties mentioned above, there are at most O(lg n) binomial
trees in the forest, where n is the total number of nodes. Thus function insertT performs
at most O(lg n) times linking, which are all constant time operation. So the performance
of insertT is O(lg n). 3
The relative Haskell program is given as below.
insertTree [] t = [t]
insertTree ts@(t':ts') t = if rank t < rank t' then t:ts
else insertTree ts' (link t t')
With this auxiliary function, it’s easy to realize the insertion. We can wrap the new
element to be inserted as the only leaf of a tree, then insert this tree to the binomial heap.
And we can continuously build a heap from a series of elements by folding. For example
the following Haskell define a helper function ’fromList’.
fromList = foldl insert []
Since wrapping an element as a singleton tree takes O(1) time, the real work is done
in insertT , the performance of binomial heap insertion is bound to O(lg n).
The insertion algorithm can also be realized with imperative approach.
Algorithm 1 continuously linking the first tree in a heap with the new tree to be
inserted if they have the same rank. After that, it puts the linked-list of the rest trees as
the sibling, and returns the new linked-list.
If using a container to manage the children of a node, the algorithm can be given in
Algorithm 2.
3 There is interesting observation by comparing this operation with adding two binary numbers. Which
In this algorithm, function Pop removes the first tree T1 = H[0] from the forest. And
function Head-Insert, insert a new tree before any other trees in the heap, so that it
becomes the first element in the forest.
With either Insert-Tree or Insert-Tree’ defined. Realize the binomial heap in-
sertion is trivial.
The following python program implement the insert algorithm by using a container
to manage sub-trees. the ‘left-child, right-sibling’ program is left as an exercise.
def insert_tree(ts, t):
while ts ̸= [] and [Link] == ts[0].rank:
t = link(t, [Link](0))
[Link](0, t)
return ts
Exercise 10.2
Write the insertion program in your favorite imperative programming language by
using the ‘left-child, right-sibling’ approach.
t1 ... t2 ...
Rank(t1)<Rank(t2)?
the smaller
T1 T2 ... Ti ...
t2 ... t1 ...
Rank(t1)=Rank(t2)
link(t1, t2)
insert
T1 T2 ... + Ti merge rest
(b) If two trees have same rank, link them to a new tree, and recursively
insert it to the merge result of the rest.
Since both heaps contain binomial trees with rank in monotonically increasing order.
Each iteration, we pick the tree with smallest rank and append it to the result heap.
If both trees have same rank we perform linking first. Consider the Append-Tree
algorithm, The rank of the new tree to be appended, can’t be less than any other trees in
the result heap according to our merge strategy, however, it might be equal to the rank of
the last tree in the result heap. This can happen if the last tree appended are the result
of linking, which will increase the rank by one. In this case, we must link the new tree to
be inserted with the last tree. In below algorithm, suppose function Last(H) refers to
the last tree in a heap, and Append(H, T ) just appends a new tree at the end of a forest.
1: function Append-Tree(H, T )
2: if H 6= ϕ∧ Rank(T ) = Rank(Last(H)) then
3: Last(H) ← Link(T , Last(H))
4: else
5: Append(H, T )
Function Append-Trees repeatedly call this function, so that it can append all trees
in a heap to the other heap.
1: function Append-Trees(H1 , H2 )
2: for each T ∈ H2 do
214 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
3: H1 ← Append-Tree(H1 , T )
The following Python program translates the merge algorithm.
def append_tree(ts, t):
if ts ̸= [] and ts[-1].rank == [Link]:
ts[-1] = link(ts[-1], t)
else:
[Link](t)
return ts
Exercise 10.3
The program given above uses a container to manage sub-trees. Implement the merge
algorithm in your favorite imperative programming language with ‘left-child, right-sibling’
approach.
Pop
Among the forest which forms the binomial heap, each binomial tree conforms to heap
property that the root contains the minimum element in that tree. However, the order
relationship of these roots can be arbitrary. To find the minimum element in the heap,
we can select the smallest root of these trees. Since there are lg n binomial trees, this
approach takes O(lg n) time.
However, after we locate the minimum element (which is also know as the top element
of a heap), we need remove it from the heap and keep the binomial property to accom-
plish heap-pop operation. Suppose the forest forms the binomial heap consists trees of
Bi , Bj , ..., Bp , ..., Bm , where Bk is a binomial tree of rank k, and the minimum element is
the root of Bp . If we delete it, there will be p children left, which are all binomial trees
with ranks p − 1, p − 2, ..., 0.
One tool at hand is that we have defined O(lg n) merge function. A possible approach
is to reverse the p children, so that their ranks change to monotonically increasing order,
and forms a binomial heap Hp . The rest of trees is still a binomial heap, we represent it
as H ′ = H − Bp . Merging Hp and H ′ given the final result of pop. Figure 10.8 illustrates
this idea.
In order to realize this algorithm, we first need to define an auxiliary function, which
10.2. BINOMIAL HEAPS 215
can extract the tree contains the minimum element at root from the forest.
(T, ϕ) : H is a singleton as {T }
extractM in(H) = (T1 , H ′ ) : Root(T1 ) < Root(T ′ ) (10.5)
(T ′ , {T1 } ∪ H ′′ ) : otherwise
where
H = {T1 , T2 , ...} for the non-empty forest case;
H ′ = {T2 , T3 , ...} is the forest without the first tree;
(T ′ , H ′′ ) = extractM in(H ′ )
The result of this function is a tuple. The first part is the tree which has the minimum
element at root, the second part is the rest of the trees after remove the first part from
the forest.
This function examine each of the trees in the forest thus is bound to O(lg n) time.
The relative Haskell program can be give respectively.
extractMin [t] = (t, [])
extractMin (t:ts) = if root t < root t' then (t, ts)
else (t', t:ts')
where
(t', ts') = extractMin ts
Of course, it’s possible to just traverse forest and pick the minimum root without
remove the tree for this purpose. Below imperative algorithm describes it with ‘left child,
right sibling’ approach.
216 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
1: function Find-Minimum(H)
2: T ← Head(H)
3: min ← ∞
4: while T 6= ϕ do
5: if Key(T )< min then
6: min ← Key(T )
7: T ← Sibling(T )
8: return min
While if we manage the children with collection containers, the link list traversing is
abstracted as to find the minimum element among the list. The following Python program
shows about this situation.
def find_min(ts):
min_t = min(ts, key=lambda t: [Link])
return min_t.key
Next we define the function to delete the minimum element from the heap by using
extractM in.
where
Translate the formula to Haskell program is trivial and we’ll skip it.
To realize the algorithm in procedural way takes extra efforts including list reversing
etc. We left these details as exercise to the reader. The following pseudo code illustrate
the imperative pop algorithm
1: function Extract-Min(H)
2: (Tmin , H) ← Extract-Min-Tree(H)
3: H ← Merge(H, Reverse(Children(Tmin )))
4: return (Key(Tmin ), H)
With pop operation defined, we can realize heap sort by creating a binomial heap from
a series of numbers, than keep popping the smallest number from the heap till it becomes
empty.
Function fromList can be defined by folding. Heap sort can also be expressed in
procedural way respectively. Please refer to previous chapter about binary heap for detail.
Exercise 10.4
10.3. FIBONACCI HEAPS 217
• Write the program to return the minimum element from a binomial heap in your
favorite imperative programming language with ’left-child, right-sibling’ approach.
• Realize the Extract-Min-Tree() Algorithm.
• For ’left-child, right-sibling’ approach, reversing all children of a tree is actually
reversing a single-direct linked-list. Write a program to reverse such linked-list in
your favorite imperative programming language.
10.3.1 Definition
Fibonacci heap is essentially a lazy evaluated binomial heap. Note that, it doesn’t mean
implementing binomial heap in lazy evaluation settings, for instance Haskell, brings Fi-
bonacci heap automatically. However, lazy evaluation setting does help in realization.
For example in [56], presents a elegant implementation.
Fibonacci heap has excellent performance theoretically. All operations except for pop
are bound to amortized O(1) time. In this section, we’ll give an algorithm different from
some popular textbook[4]. Most of the ideas present here are based on Okasaki’s work[57].
Let’s review and compare the performance of binomial heap and Fibonacci heap (more
precisely, the performance goal of Fibonacci heap).
operation Binomial heap Fibonacci heap
insertion O(lg n) O(1)
merge O(lg n) O(1)
top O(lg n) O(1)
pop O(lg n) amortized O(lg n)
Consider where is the bottleneck of inserting a new element x to binomial heap. We
actually wrap x as a singleton leaf and insert this tree into the heap which is actually a
forest.
During this operation, we inserted the tree in monotonically increasing order of rank,
and once the rank is equal, recursively linking and inserting will happen, which lead to
the O(lg n) time.
As the lazy strategy, we can postpone the ordered-rank insertion and merging op-
erations. On the contrary, we just put the singleton leaf to the forest. The problem
is that when we try to find the minimum element, for example the top operation, the
performance will be bad, because we need check all trees in the forest, and there aren’t
only O(lg n) trees.
In order to locate the top element in constant time, we must remember where is the
tree contains the minimum element as root.
218 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Based on this idea, we can reuse the definition of binomial tree and give the definition
of Fibonacci heap as the following Haskell program for example.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}
The Fibonacci heap is either empty or a forest of binomial trees with the minimum
element stored in a special one explicitly.
data FibHeap a = E | FH { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]}
For convenient purpose, we also add a size field to record how many elements are there
in a heap.
The data layout can also be defined in imperative way as the following ANSI C code.
struct node{
Key key;
struct node ∗next, ∗prev, ∗parent, ∗children;
int degree; /∗ As known as rank ∗/
int mark;
};
struct FibHeap{
struct node ∗roots;
struct node ∗minTr;
int n; /∗ number of nodes ∗/
};
For generality, Key can be a customized type, we use integer for illustration purpose.
typedef int Key;
In this chapter, we use the circular doubly linked-list for imperative settings to realize
the Fibonacci Heap as described in [4]. It makes many operations easy and fast. Note
that, there are two extra fields added. A degree, also known as rank for a node is the
number of children of this node; Flag mark is used only in decreasing key operation. It
will be explained in detail in later section.
Note that function F ibHeap() accepts three parameters, a size value, which is 1 for
this one-leaf-tree, a special tree which contains the minimum element as root, and a list
10.3. FIBONACCI HEAPS 219
of other binomial trees in the forest. The meaning of function node() is as same as before,
that it creates a binomial tree from a rank, an element, and a list of children.
Insertion can also be realized directly by appending the new node to the forest and
updated the record of the tree which contains the minimum element.
1: function Insert(H, k)
2: x ← Singleton(k) ▷ Wrap x to a node
3: append x to root list of H
4: if Tmin (H) = N IL ∨ k < Key(Tmin (H)) then
5: Tmin (H) ← x
6: n(H) ← n(H)+1
Where function Tmin () returns the tree which contains the minimum element at root.
The following C source snippet is a translation for this algorithm.
struct FibHeap∗ insert_node(struct FibHeap∗ h, struct node∗ x){
h = add_tree(h, x);
if(h→minTr == NULL | | x→key < h→minTr→key)
h→minTr = x;
h→n++;
return h;
}
Exercise 10.5
Implement the insert algorithm in your favorite imperative programming language
completely. This is also an exercise to circular doubly linked list manipulation.
Merge algorithm can also be realized imperatively by concatenating the root lists of
the two heaps.
1: function Merge(H1 , H2 )
2: H←Φ
220 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
With merge function defined, the O(1) insertion algorithm is realized as well. And
we can also give the O(1) time top function as below.
Exercise 10.6
Implement the circular doubly linked list concatenation function in your favorite im-
perative programming language.
Where f old() function is defined to iterate all elements from a list, applying a specified
function to the intermediate result and each element. it is sometimes called as reducing.
Please refer to Appendix A and the chapter of binary search tree for it.
10.3. FIBONACCI HEAPS 221
L = {x1 , x2 , ..., xn }, denotes a list of numbers; and we’ll use L′ = {x2 , x3 , ..., xn } to
represent the rest of the list with the first element removed. Function meld() is defined
as below.
{x} : L = ϕ
meld(L′ , x + x1 ) : x = x1
meld(L, x) = (10.13)
{x} ∪ L : x < x1
′
{x1 } ∪ meld(L , x) : otherwise
The consolidate() function works as the follows. It maintains an ordered result list
L, contains only unique numbers, which is initialized from an empty list ϕ. Each time
it process an element x, it firstly check if the first element in L is equal to x, if so, it
will add them together (which yields 2x), and repeatedly check if 2x is equal to the next
element in L. This process won’t stop until either the element to be melt is not equal to
the head element in the rest of the list, or the list becomes empty. Table 10.1 illustrates
the process of consolidating number sequence {2, 1, 1, 4, 8, 1, 1, 2, 4}. Column one lists the
number ’scanned’ one by one; Column two shows the intermediate result, typically the
new scanned number is compared with the first number in result list. If they are equal,
they are enclosed in a pair of parentheses; The last column is the result of meld, and it
will be used as the input to next step processing.
The Haskell program can be give accordingly.
consolidate = foldl meld [] where
meld [] x = [x]
meld (x':xs) x | x == x' = meld xs (x+x')
| x < x' = x:x':xs
| otherwise = x': meld xs x
meld [] t = [t]
meld (t':ts) t | rank t == rank t' = meld ts (link t t')
| rank t < rank t' = t:t':ts
| otherwise = t' : meld ts t
Figure 10.9 and 10.10 show the steps of consolidation when processing a Fibonacci
Heap contains different ranks of trees. Comparing with table 10.1 reveals the similarity.
a c d e i q r s u
b f g j k m t v w
h l n o x
a b c e
c a b c d f g
b d h
After we merge all binomial trees, including the special tree record for the minimum
element in root, in a Fibonacci heap, the heap becomes a Binomial heap. And we lost
the special tree, which gives us the ability to return the top element in O(1) time.
It’s necessary to perform a O(lg n) time search to resume the special tree. We can
reuse the function extractM in() defined for Binomial heap.
It’s time to give the final pop function for Fibonacci heap as all the sub problems
have been solved. Let Tmin denote the special tree in the heap to record the minimum
element in root; T denote the forest contains all the other trees except for the special
tree, s represents the size of a heap, and function children() returns all sub trees except
the root of a binomial tree.
{
ϕ : T = ϕ ∧ children(Tmin ) = ϕ
deleteM in(H) = ′
F ibHeap(s − 1, Tmin , T′ ) : otherwise
(10.15)
Where
′
(Tmin , T′ ) = extractM in(consolidate(children(Tmin ) ∪ T))
10.3. FIBONACCI HEAPS 223
a q a
b c e i b c e i
d f g j k m d f g j k m
h l n o h l n o
p p
r s b c e i
t d f g j k m
h l n o
The main part of the imperative realization is similar. We cut all children of Tmin and
append them to root list, then perform consolidation to merge all trees with the same
rank until all trees are unique in term of rank.
1: function Delete-Min(H)
2: x ← Tmin (H)
3: if x 6= N IL then
4: for each y ∈ Children(x) do
5: append y to root list of H
6: Parent(y) ← N IL
7: remove x from root list of H
8: n(H) ← n(H) - 1
9: Consolidate(H)
10: return x
Algorithm Consolidate utilizes an auxiliary array A to do the merge job. Array
A[i] is defined to store the tree with rank (degree) i. During the traverse of root list, if
we meet another tree of rank i, we link them together to get a new tree of rank i + 1.
Next we clean A[i], and check if A[i + 1] is empty and perform further linking if necessary.
After we finish traversing all roots, array A stores all result trees and we can re-construct
the heap from it.
1: function Consolidate(H)
2: D ← Max-Degree(n(H))
3: for i ← 0 to D do
4: A[i] ← N IL
5: for each x ∈ root list of H do
6: remove x from root list of H
7: d ← Degree(x)
8: while A[d] 6= N IL do
9: y ← A[d]
10: x ← Link(x, y)
11: A[d] ← N IL
12: d←d+1
13: A[d] ← x
14: Tmin (H) ← N IL ▷ root list is NIL at the time
15: for i ← 0 to D do
16: if A[i] 6= N IL then
17: append A[i] to root list of H.
18: if Tmin = N IL∨ Key(A[i]) < Key(Tmin (H)) then
19: Tmin (H) ← A[i]
The only unclear sub algorithm is Max-Degree, which can determine the upper
bound of the degree of any node in a Fibonacci Heap. We’ll delay the realization of it to
the last sub section.
Feed a Fibonacci Heap shown in Figure 10.9 to the above algorithm, Figure 10.11,
10.12 and 10.13 show the result trees stored in auxiliary array A in every steps.
Translate the above algorithm to ANSI C yields the below program.
void consolidate(struct FibHeap∗ h){
10.3. FIBONACCI HEAPS 225
c a b c d f g
b d h
if(!h→roots)
return;
int D = max_degree(h→n)+1;
struct node ∗x, ∗y;
struct node∗∗ a = (struct node∗∗)malloc(sizeof(struct node∗)∗(D+1));
int i, d;
for(i=0; i ≤ D; ++i)
a[i] = NULL;
while(h→roots){
x = h→roots;
h→roots = remove_node(h→roots, x);
d= x→degree;
while(a[d]){
y = a[d]; /∗ Another node has the same degree as x ∗/
x = link(x, y);
a[d++] = NULL;
}
a[d] = x;
}
h→minTr = h→roots = NULL;
for(i=0; i ≤ D; ++i)
if(a[i]){
h→roots = append(h→roots, a[i]);
if(h→minTr == NULL | | a[i]→key < h→minTr→key)
h→minTr = a[i];
}
free(a);
}
Exercise 10.7
Implement the remove function for circular doubly linked list in your favorite imper-
ative programming language.
226 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
b c e i
d f g j k m
h l n o
(a) Step 5
A[0] A[1] A[2] A[3] A[4]
q a
b c e i
d f g j k m
h l n o
(b) Step 6
q a
r s b c e i
t d f g j k m
h l n o
E =M ·g·h
Suppose there is a complex process, which moves the object with mass M up and
down, and finally the object stop at height h′ . And if there exists friction resistance Wf ,
We say the process works the following power.
W = M · g · (h′ − h) + Wf
Where t(H) is the number of trees in Fibonacci heap forest. We have t(H) = 1 +
length(T) for any non-empty heap.
For the n-nodes Fibonacci heap, suppose there is an upper bound of ranks for all trees
as D(n). After consolidation, it ensures that the number of trees in the heap forest is at
most D(n) + 1.
228 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Before consolidation, we actually did another important thing, which also contribute
to running time, we removed the root of the minimum tree, and concatenate all children
left to the forest. So consolidate operation at most processes D(n) + t(H) − 1 trees.
Summarize all the above factors, we deduce the amortized cost as below.
If only insertion, merge, and pop function are applied to Fibonacci heap. We ensure
that all trees are binomial trees. It is easy to estimate the upper limit D(n) is O(lg n).
(Suppose the extreme case, that all nodes are in only one Binomial tree).
However, we’ll show in next sub section that, there is operation can violate the bino-
mial tree assumption.
Exercise 10.8
Why the tree consolidation time is proportion to the number of trees it processed?
need to compare the decreased key with the parent node, and if this case happens, we can
cut this node and append it to the root list. (Remind the recursive swapping solution for
binary heap which leads to O(lg n))
x ... r
... y ...
...
Figure 10.15: x < y, cut tree x from its parent, and add x to root list.
Figure 10.15 illustrates this situation. After decreasing key of node x, it is less than
y, we cut x off its parent y, and ’past’ the whole tree rooted at x to root list.
Although we recover the property of that parent is less than all children, the tree
isn’t any longer a Binomial tree after it losses some sub tree. If a tree losses too many
of its children because of cutting, we can’t ensure the performance of merge-able heap
operations. Fibonacci Heap adds another constraints to avoid such problem:
If a node losses its second child, it is immediately cut from parent, and added to root
list
The final Decrease-Key algorithm is given as below.
1: function Decrease-Key(H, x, k)
2: Key(x) ← k
3: p ← Parent(x)
4: if p 6= N IL ∧ k < Key(p) then
5: Cut(H, x)
6: Cascading-Cut(H, p)
7: if k < Key(Tmin (H)) then
8: Tmin (H) ← x
Where function Cascading-Cut uses the mark to determine if the node is losing the
second child. the node is marked after it losses the first child. And the mark is cleared
in Cut function.
1: function Cut(H, x)
2: p ← Parent(x)
3: remove x from p
4: Degree(p) ← Degree(p) - 1
5: add x to root list of H
6: Parent(x) ← N IL
7: Mark(x) ← F ALSE
230 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
During cascading cut process, if x is marked, which means it has already lost one
child. We recursively performs cut and cascading cut on its parent till reach to root.
1: function Cascading-Cut(H, x)
2: p ← Parent(x)
3: if p 6= N IL then
4: if Mark(x) = F ALSE then
5: Mark(x) ← T RU E
6: else
7: Cut(H, x)
8: Cascading-Cut(H, p)
The relevant ANSI C decreasing key program is given as the following.
void decrease_key(struct FibHeap∗ h, struct node∗ x, Key k){
struct node∗ p = x→parent;
x→key = k;
if(p && k < p→key){
cut(h, x);
cascading_cut(h, p);
}
if(k < h→minTr→key)
h→minTr = x;
}
Exercise 10.9
Prove that Decrease-Key algorithm is amortized O(1) time.
Lemma 10.3.1. For any node x in a Fibonacci Heap, denote k = degree(x), and |x| =
size(x), then
|x| ≥ Fk+2 (10.18)
10.3. FIBONACCI HEAPS 231
Proof. Consider all k children of node x, we denote them as y1 , y2 , ..., yk in the order of
time when they were linked to x. Where y1 is the oldest, and yk is the youngest.
Obviously, |yi | ≥ 0. When we link yi to x, children y1 , y2 , ..., yi−1 have already been
there. And algorithm Link only links nodes with the same degree. Which indicates at
that time, we have
degree(yi ) = degree(x) = i − 1
After that, node yi can at most lost 1 child, (due to the decreasing key operation)
otherwise, if it will be immediately cut off and append to root list after the second child
loss. Thus we conclude
degree(yi ) ≥ i − 2
|x| ≥ sk
∑
k
=2+ sdegree(yi )
i=2
∑
k
≥2+ si−2
i=2
We next show that sk > Fk+2 . This can be proved by induction. For trivial cases, we
have s0 = 1 ≥ F2 = 1, and s1 = 2 ≥ F3 = 2. For induction case k ≥ 2. We have
|x| ≥ sk
∑
k
≥2+ si−2
i=2
∑
k
≥2+ Fi
i=2
∑
k
=1+ Fi
i=0
∑
k
Fk+2 = 1 + Fi (10.19)
i=0
• Induction case,
Fk+2 = Fk+1 + Fk
∑
k−1
=1+ Fi + Fk
i=0
∑
k
=1+ Fi
i=0
n ≥ |x| ≥ Fk + 2 (10.20)
√
Recall the result of AVL tree, that Fk ≥ ϕk , where ϕ = 1+2 5 is the golden ratio. We
also proved that pop operation is amortized O(lg n) algorithm.
Based on this result. We can define Function M axDegree as the following.
The imperative Max-Degree algorithm can also be realized by using Fibonacci se-
quences.
1: function Max-Degree(n)
2: F0 ← 0
3: F1 ← 1
4: k←2
5: repeat
6: Fk ← Fk1 + Fk2
7: k ←k+1
8: until Fk < n
9: return k − 2
Translate the algorithm to ANSI C given the following program.
int max_degree(int n){
int k, F;
int F2 = 0;
int F1 = 1;
for(F=F1+F2, k=2; F<n; ++k){
F2 = F1;
F1 = F;
F = F1 + F2;
}
return k-2;
}
minimum element (top), merging are all bounds to O(1) time, while deleting minimum
element (pop) is conjectured to amortized O(lg n) time [58] [3]. Note that this is still
a conjecture for 15 years by the time I write this chapter. Nobody has been proven it
although there are much experimental data support the O(lg n) amortized result.
Besides that, pairing heap is simple. There exist both elegant imperative and func-
tional implementations.
10.4.1 Definition
Both Binomial Heaps and Fibonacci Heaps are realized with forest. While a pairing heaps
is essentially a K-ary tree. The minimum element is stored at root. All other elements
are stored in sub trees.
The following Haskell program defines pairing heap.
data PHeap a = E | Node a [PHeap a]
This is a recursive definition, that a pairing heap is either empty or a K-ary tree,
which is consist of a root node, and a list of sub trees.
Pairing heap can also be defined in procedural languages, for example ANSI C as
below. For illustration purpose, all heaps we mentioned later are minimum-heap, and we
assume the type of key is integer 4 . We use same linked-list based left-child, right-sibling
approach (aka, binary tree representation[4]).
typedef int Key;
struct node{
Key key;
struct node ∗next, ∗children, ∗parent;
};
Note that the parent field does only make sense for decreasing key operation, which
will be explained later on. we can omit it for the time being.
• Trivial case, one heap is empty, we simply return the other heap as the result;
• Otherwise, we compare the root element of the two heaps, make the heap with
bigger root element as a new children of the other.
Let H1 , and H2 denote the two heaps, x and y be the root element of H1 and H2
respectively. Function Children() returns the children of a K-ary tree. Function N ode()
4 We can parametrize the key type with C++ template, but this is beyond our scope, please refer to
can construct a K-ary tree from a root element and a list of children.
H1 : H2 = ϕ
H2 : H1 = ϕ
merge(H1 , H2 ) = (10.22)
N ode(x, {H2 } ∪ Children(H1 )) : x<y
N ode(y, {H1 } ∪ Children(H2 )) : otherwise
Where
x = Root(H1 )
y = Root(H2 )
It’s obviously that merging algorithm is bound to O(1) time 5 . The merge equation
can be translated to the following Haskell program.
merge h E = h
merge E h = h
merge h1@(Node x hs1) h2@(Node y hs2) =
if x < y then Node x (h2:hs1) else Node y (h1:hs2)
Merge can also be realized imperatively. With left-child, right sibling approach, we
can just link the heap, which is in fact a K-ary tree, with larger key as the first new child
of the other. This is constant time operation as described below.
1: function Merge(H1 , H2 )
2: if H1 = NIL then
3: return H2
4: if H2 = NIL then
5: return H1
6: if Key(H2 ) < Key(H1 ) then
7: Exchange(H1 ↔ H2 )
8: Insert H2 in front of Children(H1 )
9: Parent(H2 ) ← H1
10: return H1
Note that we also update the parent field accordingly. The ANSI C example program
is given as the following.
struct node∗ merge(struct node∗ h1, struct node∗ h2) {
if (h1 == NULL)
return h2;
if (h2 == NULL)
return h1;
if (h2→key < h1→key)
swap(&h1, &h2);
h2→next = h1→children;
h1→children = h2;
h2→parent = h1;
h1→next = NULL; /∗Break previous link if any∗/
return h1;
}
Exercise 10.10
Implement the program of removing a node from the children of its parent in your
favorite imperative programming language. Consider how can we ensure the overall per-
formance of decreasing key is O(1) time? Is left-child, right sibling approach enough?
5 4 3 12 7 10 11 6 9
15 13 8 17 14
16
15 13 8 17 14
16
5 13 12 8 10 11 7 14
15 16
(c) Merge every two trees in pair, note that there are
odd number trees, so the last one needn’t merge.
Figure 10.16: Remove the root element, and merge children in pairs.
10.4. PAIRING HEAPS 237
6 6
9 11 7 9 11
7 14 10 14
16 16
(a) Merge tree with 9, and tree with root 6. (b) Merge tree with root 7 to the result.
3
6 12 8
7 9 11
10 14
16
4 6 12 8
5 13 7 9 11
15 10 14
16
The popping operation can also be explained in the following procedural algorithm.
1: function Pop(H)
2: L ← N IL
3: for every 2 trees Tx , Ty ∈ Children(H) from left to right do
4: Extract x, and y from Children(H)
5: T ← Merge(Tx , Ty )
6: Insert T at the beginning of L
7: H ← Children(H) ▷ H is either N IL or one tree.
8: for ∀T ∈ L from left to right do
9: H ← Merge(H, T )
10: return H
Note that L is initialized as an empty linked-list, then the algorithm iterates every two
trees in pair in the children of the K-ary tree, from left to right, and performs merging,
the result is inserted at the beginning of L. Because we insert to front end, so when we
traverse L later on, we actually process from right to left. There may be odd number of
sub-trees in H, in that case, it will leave one tree after pair-merging. We handle it by
start the right to left merging from this left tree.
Below is the ANSI C program to this algorithm.
struct node∗ pop(struct node∗ h) {
struct node ∗x, ∗y, ∗lst = NULL;
while ((x = h→children) ̸= NULL) {
if ((h→children = y = x→next) ̸= NULL)
h→children = h→children→next;
lst = push_front(lst, merge(x, y));
}
x = NULL;
while((y = lst) ̸= NULL) {
lst = lst→next;
x = merge(x, y);
}
free(h);
return x;
}
The pairing heap pop operation is conjectured to be amortized O(lg n) time [58].
Exercise 10.11
Write a program to insert a tree at the beginning of a linked-list in your favorite
imperative programming language.
Delete a node
We didn’t mention delete in Binomial heap or Fibonacci Heap. Deletion can be realized
by first decreasing key to minus infinity (−∞), then performing pop. In this section, we
present another solution for delete node.
The algorithm is to define the function delete(H, x), where x is a node in a pairing
heap H 6 .
6 Here the semantic of x is a reference to a node.
10.5. NOTES AND SHORT SUMMARY 239
If x is root, we can just perform a pop operation. Otherwise, we can cut x from H,
perform a pop on x, and then merge the pop result back to H. This can be described as
the following.
{
pop(H) : x is root of H
delete(H, x) = (10.26)
merge(cut(H, x), pop(x)) : otherwise
Exercise 10.12
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[3] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
241
242 Queue
Chapter 11
11.1 Introduction
It seems that queues are relative simple. A queue provides FIFO (first-in, first-out) data
manipulation support. There are many options to realize queue includes singly linked-list,
doubly linked-list, circular buffer etc. However, we’ll show that it’s not so easy to realize
queue in purely functional settings if it must satisfy abstract queue properties.
In this chapter, we’ll present several different approaches to implement queue. A queue
is a FIFO data structure satisfies the following performance constraints.
• Element can be added to the tail of the queue in O(1) constant time;
• Element can be removed from the head of the queue in O(1) constant time.
These two properties must be satisfied. And it’s common to add some extra goals,
such as dynamic memory allocation etc.
Of course such abstract queue interface can be implemented with doubly-linked list
trivially. But this is a overkill solution. We can even implement imperative queue with
singly linked-list or plain array. However, our main question here is about how to realize
a purely functional queue as well?
We’ll first review the typical queue solution which is realized by singly linked-list
and circular buffer in first section; Then we give a simple and straightforward functional
solution in the second section. While the performance is ensured in terms of amortized
constant time, we need find real-time solution (or worst-case solution) for some special
case. Such solution will be described in the third and the fourth section. Finally, we’ll
show a very simple real-time queue which depends on lazy evaluation.
Most of the functional contents are based on Chris, Okasaki’s great work in [3]. There
are more than 16 different types of purely functional queue given in that material.
243
244 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
In order to operate on tail, for plain singly linked-list, we must traverse the whole list
before adding or removing. Traversing is bound to O(n) time, where n is the length of
the list. This doesn’t match the abstract queue properties.
The solution is to use an extra record to store the tail of the linked-list. A sentinel
is often used to simplify the boundary handling. The following ANSI C 1 code defines a
queue realized by singly linked-list.
typedef int Key;
struct Node{
Key key;
struct Node∗ next;
};
struct Queue{
struct Node ∗head, ∗tail;
};
Figure 11.1 illustrates an empty list. Both head and tail point to the sentinel NIL
node.
head tail
Figure 11.1: The empty queue, both head and tail point to sentinel node.
1 It’s possible to parameterize the type of the key with C++ template. ANSI C is used here for
illustration purpose.
11.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 245
To ensure the constant time Enqueue and Dequeue, we add new element to head
and remove element from tail.2
function Enqueue(Q, x)
p ← Create-New-Node
Key(p) ← x
Next(p) ← N IL
Next(Tail(Q)) ← p
Tail(Q) ← p
Note that, as we use the sentinel node, there are at least one node, the sentinel in the
queue. That’s why we needn’t check the validation of of the tail before we append the
new created node p to it.
function Dequeue(Q)
x ← Head(Q)
Next(Head(Q)) ← Next(x)
if x = Tail(Q) then ▷ Q gets empty
Tail(Q) ← Head(Q)
return Key(x)
As we always put the sentinel node in front of all the other nodes, function Head
actually returns the next node to the sentinel.
Figure 11.2 illustrates Enqueue and Dequeue process with sentinel node.
Translating the pseudo code to ANSI C program yields the below code.
struct Queue∗ enqueue(struct Queue∗ q, Key x) {
struct Node∗ p = (struct Node∗)malloc(sizeof(struct Node));
p→key = x;
p→next = NULL;
q→tail→next = p;
q→tail = p;
return q;
}
This solution is simple and robust. It’s easy to extend this solution even to the
concurrent environment (e.g. multicores). We can assign a lock to the head and use
another lock to the tail. The sentinel helps us from being dead-locked due to the empty
case [59] [60].
Exercise 11.1
2 It’s possible to add new element to the tail, while remove element from head, but the operations are
Enqueue
head tail
head tail
Dequeue
head tail
When initialize the queue, we are explicitly asked to provide the maximum size as the
parameter.
struct QueueBuf∗ createQ(int max){
struct QueueBuf∗ q = (struct QueueBuf∗)malloc(sizeof(struct QueueBuf));
q→buf = (Key∗)malloc(sizeof(Key)∗max);
q→size = max;
q→head = q→cnt = 0;
return q;
}
With the counter variable, we can compare it with zero and the capacity to test if the
queue is empty or full.
function Empty?(Q)
return Count(Q) = 0
To realize Enqueue and Dequeue, an easy way is to calculate the modular of index
as the following.
function Enqueue(Q, x)
if ¬ Full?(Q) then
Count(Q) ← Count(Q) + 1
tail ← (Head(Q) + Count(Q)) mod Size(Q)
Buffer(Q)[tail] ← x
function Head(Q)
if ¬ Empty?(Q) then
return Buffer(Q)[Head(Q)]
function Dequeue(Q)
if ¬ Empty?(Q) then
Head(Q) ← (Head(Q) + 1) mod Size(Q)
Count(Q) ← Count(Q) - 1
However, modular is expensive and slow depends on some settings, so one may replace
it by some adjustment. For example as in the below ANSI C program.
void enQ(struct QueueBuf∗ q, Key x){
if(!fullQ(q)){
q→buf[offset(q→head + q→cnt, q→size)] = x;
q→cnt++;
}
}
248 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
Exercise 11.2
The circular buffer is allocated with a maximum size parameter. Can we test the
queue is empty or full with only head and tail pointers? Note that the head can be either
before or after the tail.
EnQueue O(1) x[n] x[n-1] ... x[2] x[1] NIL DeQueue O(n)
Figure 11.5: DeQueue and EnQueue can’t perform both in constant O(1) time with a
list.
11.6.
rear
Figure 11.6: A queue with front and rear list shapes like a horseshoe magnet.
With this setup, we push new element to the head of the rear list, which is ensure to
be O(1) constant time; on the other hand, we pop element from the head of the front list,
which is also O(1) constant time. So that the abstract queue properties can be satisfied.
The definition of such paired-list queue can be expressed in the following Haskell code.
type Queue a = ([a], [a])
Suppose function f ront(Q) and rear(Q) return the front and rear list in such setup,
and Queue(F, R) create a paired-list queue from two lists F and R. The EnQueue (push)
11.3. PURELY FUNCTIONAL SOLUTION 251
and DeQueue (pop) operations can be easily realized based on this setup.
push(Q, x) = Queue(f ront(Q), {x} ∪ rear(Q)) (11.1)
pop(Q) = Queue(tail(f ront(Q)), rear(Q)) (11.2)
where if a list X = {x1 , x2 , ..., xn }, function tail(X) = {x2 , x3 , ..., xn } returns the rest
of the list without the first element.
However, we must next solve the problem that after several pop operations, the front
list becomes empty, while there are still elements in rear list. One method is to rebuild
the queue by reversing the rear list, and use it to replace front list.
Hence a balance operation will be execute after popping. Let’s denote the front and
rear list of a queue Q as F = f ront(Q), and R = f ear(Q).
{
Queue(reverse(R), ϕ) : F = ϕ
balance(F, R) = (11.3)
Q : otherwise
Thus if front list isn’t empty, we do nothing, while when the front list becomes empty,
we use the reversed rear list as the new front list, and the new rear list is empty.
The new enqueue and dequeue algorithms are updated as below.
push(Q, x) = balance(F, {x} ∪ R) (11.4)
pop(Q) = balance(tail(F ), R) (11.5)
Sum up the above algorithms and translate them to Haskell yields the following pro-
gram.
balance :: Queue a → Queue a
balance ([], r) = (reverse r, [])
balance q = q
Although we only touch the heads of front list and rear list, the overall performance
can’t be kept always as O(1). Actually, the performance of this algorithm is amortized
O(1). This is because the reverse operation takes time proportion to the length of the rear
list. it’s bound O(n) time, where N = |R|. We left the prove of amortized performance
as an exercise to the reader.
Note that linked-list performs in constant time on head, but in linear time on tail;
while array performs in constant time on tail (suppose there is enough memory spaces,
and omit the memory reallocation for simplification), but in linear time on head. This
is because we need do shifting when prepare or eliminate an empty cell in array. (see
chapter ’the evolution of insertion sort’ for detail.)
The above table shows an interesting characteristic, that we can exploit it and provide
a solution mimic to the paired-list queue: We concatenate two arrays, head-to-head, to
make a horseshoe shape queue like in figure 11.7.
front array
rear array
Figure 11.7: A queue with front and rear arrays shapes like a horseshoe magnet.
3
We can define such paired-array queue like the following Python code
class Queue:
def __init__(self):
[Link] = []
[Link] = []
def is_empty(q):
return [Link] == [] and [Link] == []
The relative Push() and Pop() algorithm only manipulate on the tail of the arrays.
function Push(Q, x)
Append(Rear(Q), x)
Here we assume that the Append() algorithm append element x to the end of the
array, and handle the necessary memory allocation etc. Actually, there are multiple
memory handling approaches. For example, besides the dynamic re-allocation, we can
initialize the array with enough space, and just report error if it’s full.
function Pop(Q)
if Front(Q) = ϕ then
Front(Q) ← Reverse(Rear(Q))
Rear(Q) ← ϕ
n ← Length(Front(Q))
x ← Front(Q)[n]
Length(Front(Q)) ← n − 1
3 Legacy Basic code is not presented here. And we actually use list but not array in Python to illustrate
the idea. ANSI C and ISO C++ programs are provides along with this chapter, they show more in a
purely array manner.
11.4. A SMALL IMPROVEMENT, BALANCED QUEUE 253
return x
For simplification and pure illustration purpose, the array isn’t shrunk explicitly after
elements removed. So test if front array is empty (ϕ) can be realized as check if the length
of the array is zero. We omit all these details here.
The enqueue and dequeue algorithms can be translated to Python programs straight-
forwardly.
def push(q, x):
[Link](x)
def pop(q):
if [Link] == []:
[Link]()
([Link], [Link]) = ([Link], [])
return [Link]()
Similar to the paired-list queue, the performance is amortized O(1) because the reverse
procedure takes linear time.
Exercise 11.3
|R| ≤ |F | (11.6)
Where R = Rear(Q), F = F ront(Q), and |L| is the length of list L. This constraint
ensure the length of the rear list is less than the length of the front list. So that the
reverse procedure will be executed once the rear list grows longer than the front list.
Here we need frequently access the length information of a list. However, calculate the
length takes linear time for singly linked-list. We can record the length to a variable and
update it as adding and removing elements. This approach enables us to get the length
information in constant time.
Below example shows the modified paired-list queue definition which is augmented
with length fields.
data BalanceQueue a = BQ [a] Int [a] Int
254 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
As we keep the invariant as specified in (11.6), we can easily tell if a queue is empty
by testing the length of the front list.
F = ϕ ⇔ |F | = 0 (11.7)
In the rest part of this section, we suppose the length of a list L, can be retrieved as
|L| in constant time.
Push and pop are almost as same as before except that we check the balance invariant
by passing length information and performs reversing accordingly.
Exercise 11.4
Write the symmetric balance improvement solution for paired-array queue in your
favorite imperative programming language.
|R| = |F | + 1 (11.11)
Both F and the result of reverse(R) are singly linked-list, It takes O(|F |) time to
concatenate them together, and it takes extra O(|R|) time to reverse the rear list, so the
total computation is bound to O(|N |), where N = |F | + |R|. Which is proportion to the
total number of elements in the queue.
In order to realize a real-time queue, we can’t computing F ∪ reverse(R) monolithic.
Our strategy is to distribute this expensive computation to every pop and push operations.
Thus although each pop and push get a bit slow, we may avoid the extremely slow worst
pop or push case.
Incremental reverse
Let’s examine how functional reverse algorithm is implemented typically.
{
ϕ : X=ϕ
reverse(X) = (11.12)
reverse(X ′ ) ∪ {x1 } : otherwise
Where
{
′ A : X=ϕ
reverse (X, A) = (11.14)
reverse′ (X ′ , {x1 } ∪ A) : otherwise
In every non-trivial case, we takes the first element from X in O(1) time; then put it
in front of the accumulator A, which is again O(1) constant time. We repeat it n times,
so this is a linear time (O(n)) algorithm.
The latter version of reverse is obviously a tail-recursion algorithm, see [5] and [62]
for detail. Such characteristic is easy to change from monolithic algorithm to incremental
manner.
The solution is state transferring. We can use a state machine contains two types of
stat: reversing state Sr to indicate that the reverse is still on-going (not finished), and
finish state Sf to indicate the reverse has been done (finished). In Haskell programming
language, it can be defined as a type.
256 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
And we can schedule (slow-down) the above reverse′ (X, A) function with these two
types of state.
{
(Sf , A) : S = Sr ∧ X = ϕ
step(S, X, A) = (11.15)
(Sr , X ′ , {x1 } ∪ A) : S = Sr ∧ X 6= ϕ
Each step, we examine the state type first, if the current state is Sr (on-going), and
the rest elements to be reversed in X is empty, we can turn the algorithm to finish state
Sf ; otherwise, we take the first element from X, put it in front of A just as same as above,
but we do NOT perform recursion, instead, we just finish this step. We can store the
current state as well as the resulted X and A, the reverse can be continued at any time
when we call ’next’ step function in the future with the stored state, X and A passed in.
Here is an example of this step-by-step reverse algorithm.
Now we can distribute the reverse into steps in every pop and push operations. How-
ever, the problem is just half solved. We want to break down F ∪reverse(R), and we have
broken reverse(R) into steps, we next need to schedule(slow-down) the list concatenation
part F ∪ ..., which is bound to O(|F |), into incremental manner so that we can distribute
it to pop and push operations.
Incremental concatenate
It’s a bit more challenge to implement incremental list concatenation than list reversing.
However, it’s possible to re-use the result we gained from increment reverse by a small
←
−
trick: In order to realize X ∪ Y , we can first reverse X to X , then take elements one by
←
−
one from X and put them in front of Y just as what we have done in reverse′ .
X ∪Y ≡ reverse(reverse(X)) ∪ Y
≡ reverse′ (reverse(X), ϕ) ∪ Y
≡ reverse′ (reverse(X), Y ) (11.16)
←−
≡ reverse′ ( X , Y )
This fact indicates us that we can use an extra state to instruct the step() function
←
−
to continuously concatenating F after R is reversed.
The strategy is to do the total work in two phases:
←
− ←
−
1. Reverse both F and R in parallel to get F = reverse(F ), and R = reverse(R)
incrementally;
11.5. ONE MORE STEP IMPROVEMENT, REAL-TIME QUEUE 257
←
− ←
−
2. Incrementally take elements from F and put them in front of R .
Because we reverse F and R simultaneously, so reversing state takes two pairs of lists
and accumulators.
The state transferring is defined according to the two phases strategy described pre-
viously. Denotes that F = {f1 , f2 , ...}, F ′ = tail(F ) = {f2 , f3 , ...}, R = {r1 , r2 , ...},
R′ = tail(R) = {r2 , r3 , ...}. A state S, contains it’s type S, which has the value among
←
−
Sr , Sc , and Sf . Note that S also contains necessary parameters such as F , F , X, A etc
as intermediate results. These parameters vary according to the different states.
←
− ←−
(Sr , F ′ , {f1 } ∪ F , R′ , {r1 } ∪ R ) : S = Sr ∧ F 6= ϕ ∧ R 6= ϕ
←
− ←−
next(S) = (Sc , F , {r1 } ∪ R ) : S = Sr ∧ F = ϕ ∧ R = {r1 }
(Sf , A) : S = Sc ∧ X = ϕ
(Sc , X ′ , {x1 } ∪ A) : S = Sc ∧ X 6= ϕ
(11.17)
The relative Haskell program is list as below.
next (Reverse (x:f) f' (y:r) r') = Reverse f (x:f') r (y:r')
next (Reverse [] f' [y] r') = Concat f' (y:r')
next (Concat 0 _ acc) = Done acc
next (Concat (x:f') acc) = Concat f' (x:acc)
All left to us is to distribute these incremental steps into every pop and push operations
to implement a real-time O(1) purely functional queue.
Sum up
Before we dive into the final real-time queue implementation, let’s analyze how many
incremental steps are taken to achieve the result of F ∪ reverse(R). According to the
balance variant we used previously, |R| = |F | + 1, Let’s denotes m = |F |.
Once the queue gets unbalanced due to some push or pop operation, we start this
incremental F ∪ reverse(R). It needs m + 1 steps to reverse R, and at the same time, we
finish reversing the list F within these steps. After that, we need extra m + 1 steps to
execute the concatenation. So there are 2m + 2 steps.
It seems that distribute one step inside one pop or push operation is the natural
solution. However, there is a critical question must be answered: Is it possible that
before we finish these 2m + 2 steps, the queue gets unbalanced again due to a series push
and pop?
There are two facts about this question, one is good news and the other is bad news.
Let’s first show the good news, that luckily, continuously pushing can’t make the queue
unbalanced again before we finish these 2m + 2 steps to achieve F ∪ reverse(R). This is
because once we start re-balancing, we can get a new front list F ′ = F ∪ reverse(R) after
2m + 2 steps. While the next time unbalance is triggered when
|R′ | = |F ′ | + 1
= |F | + |R| + 1 (11.18)
= 2m + 2
258 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
That is to say, even we continuously pushing as mush elements as possible after the
last unbalanced time, when the queue gets unbalanced again, the 2m + 2 steps exactly
get finished at that time point. Which means the new front list F ′ is calculated OK. We
can safely go on to compute F ′ ∪ reverse(R′ ). Thanks to the balance invariant which is
designed in previous section.
But, the bad news is that, pop operation can happen at anytime before these 2m + 2
steps finish. The situation is that once we want to extract element from front list, the
new front list F ′ = F ∪ reverse(R) hasn’t been ready yet. We don’t have a valid front
list at hand.
One solution to this problem is to keep a copy of original front list F , during the
time we are calculating reverse(F ) which is described in phase 1 of our incremental
computing strategy. So that we are still safe even if user continuously performs first m
pop operations. So the queue looks like in table 11.1 at some time after we start the
incremental computation and before phase 1 (reverse F and R simultaneously) ending4 .
After these M pop operations, the copy of F is exhausted. And we just start incre-
mental concatenation phase at that time. What if user goes on popping?
The fact is that since F is exhausted (becomes ϕ), we needn’t do concatenation at all.
←
− ←
− ← −
Since F ∪ R = ϕ ∪ R = R .
It indicates us, when doing concatenation, we only need to concatenate those elements
haven’t been popped, which are still left in F . As user pops elements one by one contin-
uously from the head of front list F , one method is to use a counter, record how many
elements there are still in F . The counter is initialized as 0 when we start computing
F ∪ reverse(R), it’s increased by one when we reverse one element in F , which means we
need concatenate this element in the future; and it’s decreased by one every time when
pop is performed, which means we can concatenate one element less; of course we need
decrease this counter as well in every steps of concatenation. If and only if this counter
becomes zero, we needn’t do concatenations any more.
We can give the realization of purely functional real-time queue according to the above
analysis.
We first add an idle state S0 to simplify some state transferring. Below Haskell
program is an example of this modified state definition.
data State a = Empty
| Reverse Int [a] [a] [a] [a] −− n, f’, acc_f’ r, acc_r
| Append Int [a] [a] −− n, rev_f’, acc
| Done [a] −− result: f ++ reverse r
And the data structure is defined with three parts, the front list (augmented with
length); the on-going state of computing F ∪ reverse(R); and the rear list (augmented
with length).
4 One may wonder that copying a list takes linear time to the length of the list. If so the whole solution
would make no sense. Actually, this linear time copying won’t happen at all. This is because the purely
functional nature, the front list won’t be mutated either by popping or by reversing. However, if trying to
realize a symmetric solution with paired-array and mutate the array in-place, this issue should be stated,
and we can perform a ‘lazy’ copying, that the real copying work won’t execute immediately, instead, it
copies one element every step we do incremental reversing. The detailed implementation is left as an
exercise.
11.5. ONE MORE STEP IMPROVEMENT, REAL-TIME QUEUE 259
The empty queue is composed with empty front and rear list together with idle state
S0 as Queue(ϕ, 0, S0 , ϕ, 0). And we can test if a queue is empty by checking if |F | = 0
according to the balance invariant defined before. Push and pop are changed accordingly.
The major difference is abort() function. Based on our above analysis, when there is
popping, we need decrease the counter, so that we can concatenate one element less. We
define this as aborting. The details will be given after balance() function.
The relative Haskell code for push and pop are listed like this.
push (RTQ f lenf s r lenr) x = balance f lenf s (x:r) (lenr + 1)
pop (RTQ (_:f) lenf s r lenr) = balance f (lenf - 1) (abort s) r lenr
The balance() function first check the balance invariant, if it’s violated, we need start
re-balance it by starting compute F ∪reverse(R) incrementally; otherwise we just execute
one step of the unfinished incremental computation.
{
step(F, |F |, S, R, |R|) : |R| ≤ |F |
balance(F, |F |, S, R, |R|) =
step(F, |F | + |R|, (Sr , 0, F, ϕ, R, ϕ)ϕ, 0) : otherwise
(11.21)
The relative Haskell code is given like below.
balance f lenf s r lenr
| lenr ≤ lenf = step f lenf s r lenr
| otherwise = step f (lenf + lenr) (Reverse 0 f [] r []) [] 0
The step() function typically transfer the state machine one state ahead, and it will
turn the state to idle (S0 ) when the incremental computation finishes.
{
Queue(F ′ , |F |, S0 , R, |R|) : S ′ = Sf
step(F, |F |, S, R, |R|) = (11.22)
Queue(F, |F |, S ′ , R, |R|) : otherwise
Function abort() is used to tell the state machine, we can concatenate one element
less since it is popped.
(Sf , A′ ) : S = Sc ∧ n = 0
(Sc , n − 1, X ′ A) : S = Sc ∧ n 6= 0
abort(S) = ←− ←
− (11.24)
(Sr , n − 1, F, F , R, R ) : S = Sr
S : otherwise
It seems that we’ve done, however, there is still one tricky issue hidden behind us. If
we push an element x to an empty queue, the result queue will be:
If we perform pop immediately, we’ll get an error! We found that the front list is
empty although the previous computation of F ∪ reverse(R) has been finished. This is
because it takes one more extra step to transfer from the state (Sc , 0, ϕ, A) to (Sf , A).
It’s necessary to refine the S ′ in step() function a bit.
{
next(next(S)) : F = ϕ
S′ = (11.25)
next(S) : otherwise
Note that this algorithm differs from the one given by Chris Okasaki in [3]. Okasaki’s
algorithm executes two steps per pop and push, while the one presents in this chapter
executes only one per pop and push, which leads to more distributed performance.
Exercise 11.5
• Realize the real-time queue with symmetric paired-array queue solution in your
favorite imperative programming language.
• In the footnote, we mentioned that when we start incremental reversing with in-
place paired-array solution, copying the array can’t be done monolithic or it will
lead to linear time operation. Implement the lazy copying so that we copy one
element per step along with the reversing.
11.6. LAZY REAL-TIME QUEUE 261
Where we initialized X as the front list F , Y as the rear list R, and the accumulator
A is initialized as empty ϕ.
The trigger of rotation is still as same as before when |F | + 1 = |R|. Let’s keep this
constraint as an invariant during the whole rotation process, that |X| + 1 = |Y | always
holds.
It’s obvious to deduce to the trivial case:
Denote X = {x1 , x2 , ...}, Y = {y1 , y2 , ...}, and X ′ = {x2 , x3 , ...}, Y ′ = {y2 , y3 , ...} are
the rest of the lists without the first element for X and Y respectively. The recursion
case is ruled out as the following.
If we execute ∪ lazily instead of strictly, that is, execute ∪ once pop or push operation
is performed, the computation of rotate can be distribute to push and pop naturally.
Based on this idea, we modify the paired-list queue definition to change the front list
to a lazy list, and augment it with a computation stream. [63]. When the queue triggers
re-balance constraint by some pop/push, that |F | + 1 = |R|, The algorithm creates a lazy
rotation computation, then use this lazy rotation as the new front list F ′ ; the new rear
list becomes ϕ, and a copy of F ′ is maintained as a stream.
After that, when we performs every push and pop; we consume the stream by forcing
a ∪ operation. This results us advancing one step along the stream, {x} ∪ F ′′ , where
F ′′ = tail(F ′ ). We can discard x, and replace the stream F ′ with F ′′ .
Once all of the stream is exhausted, we can start another rotation.
In order to illustrate this idea clearly, we turns to Scheme/Lisp programming language
to show example codes, because it gives us explicit control of laziness.
In Scheme/Lisp, we have the following three tools to deal with lazy stream.
(define (cons-stream a b) (cons a (delay b)))
; ; Auxiliary functions
(define (front-lst q) (car q))
A queue is consist of three parts, a front list, a rear list, and a stream which represents
the computation of F ∪ reverse(R). Create an empty queue is trivial as making all these
three parts null.
(define empty (make-queue '() '() '()))
Note that the front-list is also lazy stream actually, so we need use stream related
functions to manipulate it. For example, the following function test if the queue is empty
by checking the front lazy list stream.
(define (empty? q) (stream-null? (front-lst q)))
The push function is almost as same as the one given in previous section. That we
put the new element in front of the rear list; and then examine the balance invariant and
do necessary balancing works.
push(Q, x) = balance(F, {x} ∪ R, Rs ) (11.30)
Where F represents the lazy stream of front list; Rs is the stream of rotation compu-
tation. The relative Scheme/Lisp code is give below.
(define (push q x)
(balance (front-lst q) (cons x (rear q)) (rots q)))
While pop is a bit different, because the front list is actually lazy stream, we need
force an evaluation. All the others are as same as before.
pop(Q) = balance(F ′ , R, Rs ) (11.31)
Here F ′ , force one evaluation to F, the Scheme/Lisp code regarding to this equation
is as the following.
(define (pop q)
(balance (stream-cdr (front-lst q)) (rear q) (rots q)))
For illustration purpose, we skip the error handling (such as pop from an empty queue
etc) here.
And one can access the top element in the queue by extract from the front list stream.
(define (front q) (stream-car (front-lst q)))
The balance function first checks if the computation stream is completely exhausted,
and starts new rotation accordingly; otherwise, it just consumes one evaluation by en-
forcing the lazy stream.
{
Queue(F ′ , ϕ, F ′ ) : Rs = ϕ
balance(Q) = (11.32)
Queue(F, R, R′s ) : otherwise
11.7. NOTES AND SHORT SUMMARY 263
We used explicit lazy evaluation in Scheme/Lisp. Actually, this program can be very
short by using lazy programming languages, for example, Haskell.
data LazyRTQueue a = LQ [a] [a] [a] −− front, rear, f ++ reverse r
instance Queue LazyRTQueue where
empty = LQ [] [] []
Note that, although we haven’t mentioned priority queue, it’s quite possible to realized
it with heaps. We have covered topic of heaps in several previous chapters.
Exercise 11.6
• Realize dequeue, which support adding and removing elements on both sides in
constant O(1) time in purely functional way.
• Realize dequeue in a symmetric solution only with array in your favorite imperative
programming language.
Bibliography
[1] Maged M. Michael and Michael L. Scott. “Simple, Fast, and Prac-
tical Non-Blocking and Blocking Concurrent Queue Algorithms”.
[Link]
[2] Herb Sutter. “Writing a Generalized Concurrent Queue”. Dr. Dobb’s Oct 29, 2008.
[Link]
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[4] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[5] Wikipedia. “Tail-call”. [Link]
265
266 Sequences
Chapter 12
12.1 Introduction
In the first chapter of this book, which introduced binary search tree as the ‘hello world’
data structure, we mentioned that neither queue nor array is simple if realized not only
in imperative way, but also in functional approach. In previous chapter, we explained
functional queue, which achieves the similar performance as its imperative counterpart.
In this chapter, we’ll dive into the topic of array-like data structures.
We have introduced several data structures in this book so far, and it seems that func-
tional approaches typically bring more expressive and elegant solution. However, there
are some areas, people haven’t found competitive purely functional solutions which can
match the imperative ones. For instance, the Ukkonen linear time suffix tree construction
algorithm. another examples is Hashing table. Array is also among them.
Array is trivial in imperative settings, it enables randomly accessing any elements with
index in constant O(1) time. However, this performance target can’t be achieved directly
in purely functional settings as there is only list can be used.
In this chapter, we are going to abstract the concept of array to sequences. Which
support the following features
• Element can be inserted to or removed from the head of the sequence quickly in
O(1) time;
• Element can be inserted to or removed from the tail of the sequence quickly in O(1)
time;
We call these features abstract sequence properties, and it easy to see the fact that
even array (here means plain-array) in imperative settings can’t meet them all at the
same time.
We’ll provide three solutions in this chapter. Firstly, we’ll introduce a solution based
on binary tree forest and numeric representation; Secondly, we’ll show a concatenate-able
list solution; Finally, we’ll give the finger tree solution.
Most of the results are based on Chris, Okasaki’s work in [3].
267
268 CHAPTER 12. SEQUENCES, THE LAST BRICK
n = 20 e0 + 21 e1 + ... + 2m em (12.1)
t2
t1
x1 x2 x3 x4 x5 x6
so we need a tree of size 2, which has depth of 2; the highest bit is also 1, thus we need a
tree of size 4, which has depth of 3.
This method represents the sequence {x1 , x2 , ..., xn } to a list of trees {t0 , t1 , ..., tm }
where ti is either empty if ei = 0 or a complete binary tree if ei = 1. We call this
representation as Binary Random Access List [3].
We can reused the definition of binary tree. For example, the following Haskell pro-
gram defines the tree and the binary random access list.
data Tree a = Leaf a
| Node Int (Tree a) (Tree a) −− size, left, right
type BRAList a = [Tree a]
The only difference from the typical binary tree is that we augment the size information
to the tree. This enable us to get the size without calculation at every time. For instance.
size (Leaf _) = 1
size (Node sz _ _) = sz
2. Examine the first tree in the forest, compare its size with t′ , if its size is greater
than t′ , we just let t′ be the new head of the forest, since the forest is a linked-list of
tree, insert t′ to its head is trivial operation, which is bound to constant O(1) time;
3. Otherwise, if the size of first tree in the forest is equal to t′ , let’s denote this tree in
the forest as ti , we can construct a new binary tree t′i+1 by linking ti and t′ as its
left and right children. After that, we recursively try to insert t′i+1 to the forest.
270 CHAPTER 12. SEQUENCES, THE LAST BRICK
x1
x2 x1
t1
x3 x2 x1 x4 x3 x2 x1
(c) Insert x3 . the result is two trees, t1 and t2 (d) Insert x4 . It first causes
linking two leafs to a binary
tree, then it performs link-
ing again, which results a
final tree of height 2.
t2 t2
t1
x5 x4 x3 x2 x1 x6 x5 x4 x3 x2 x1
(a) Insert x5 . The forest is a leaf (b) Insert x6 . It links two leaf to t1 .
(t0 ) and t2 .
Figure 12.2 and 12.3 illustrate the steps of inserting elements x1 , x2 , ..., x6 to an empty
forest.
As there are at most M trees in the forest, and m is bound to O(lg n), so the insertion
to head algorithm is ensured to perform in O(lg n) even in worst case. We’ll prove the
amortized performance is O(1) later.
Let’s formalize the algorithm. we define the function of inserting an element in front
of a sequence as insert(S, x).
insert(S, x) = insertT ree(S, leaf (x)) (12.2)
This function just wrap element x to a singleton tree with a leaf, and call insertT ree
to insert this tree to the forest. Suppose the forest F = {t1 , t2 , ...} if it’s not empty, and
F ′ = {t2 , t3 , ...} is the rest of trees without the first one.
{t} : F = ϕ
insertT ree(F, t) = {t} ∪ F : size(t) < size(t1 ) (12.3)
insertT ree(F ′ , link(t, t1 )) : otherwise
Where function link(t1 , t2 ) create a new tree from two small trees with same size.
Suppose function tree(s, t1 , t2 ) create a tree, set its size as s, makes t1 as the left child,
and t2 as the right child, linking can be realized as below.
link(t1 , t2 ) = tree(size(t1 ) + size(t2 ), t1 , t2 ) (12.4)
The relative Haskell programs can be given by translating these definitions.
cons :: a → BRAList a → BRAList a
cons x ts = insertTree ts (Leaf x)
Here we use the Lisp tradition to name the function that insert an element before a
list as ‘cons’.
t2 t2
x5 x4 x3 x2 x1 x4 x3 x2 x1
x3 x2 x1
With this function defined, it’s convenient to give head and tail functions, the former
returns the first element in the sequence, the latter return the rest.
Where function f irst returns the first element in a paired-value (as known as tuple);
second returns the second element respectively. Function key is used to access the element
inside a leaf. Below are Haskell programs corresponding to these two functions.
head' ts = x where (Leaf x, _) = extractTree ts
tail' = snd ◦ extractTree
Note that as head and tail functions have already been defined in Haskell standard
library, we given them apostrophes to make them distinct. (another option is to hide the
standard ones by importing. We skip the details as they are language specific).
Given an index i, and a sequence S, which is actually a forest of trees, the algorithm
is executed as the following 1 .
1. Compare i with the size of the first tree T1 in the forest, if i is less than or equal to
the size, the element exists in T1 , perform looking up in T1 ;
2. Otherwise, decrease i by the size of T1 , and repeat the previous step in the rest of
the trees in the forest.
Where |T | = size(T ), and S ′ = {T2 , T3 , ...} is the rest of trees without the first one in
the forest. Note that we don’t handle out of bound error case, this is left as an exercise
to the reader.
Function lookupT ree is a binary search algorithm. If the index i is 1, we just return
the root of the tree, otherwise, we halve the tree by unlinking, if i is less than or equal to
the size of the halved tree, we recursively look up the left tree, otherwise, we look up the
right tree.
root(T ) : i = 1
lookupT ree(T, i) = lookupT ree(lef t(T )) : i ≤ b |T2 | c (12.9)
lookupT ree(right(T )) : otherwise
Where function lef t returns the left tree Tl of T , while right returns Tr .
The corresponding Haskell program is given as below.
getAt (t:ts) i = if i < size t then lookupTree t i
else getAt ts (i - size t)
lookupTree (Leaf x) 0 = x
lookupTree (Node sz t1 t2) i = if i < sz `div` 2 then lookupTree t1 i
else lookupTree t2 (i - sz `div` 2)
Figure 12.5 illustrates the steps of looking up the 4-th element in a sequence of size
6. It first examine the first tree, since the size is 2 which is smaller than 4, so it goes on
looking up for the second tree with the updated index i′ = 4 − 2, which is the 2nd element
in the rest of the forest. As the size of the next tree is 4, which is greater than 2, so the
element to be searched should be located in this tree. It then examines the left sub tree
since the new index 2 is not greater than the half size 4/2=2; The process next visits the
right grand-child, and the final result is returned.
By using the similar idea, we can update element at any arbitrary position i. We first
compare the size of the first tree T1 in the forest with i, if it is less than i, it means the
element to be updated doesn’t exist in the first tree. We recursively examine the next
tree in the forest, comparing it with i − |T1 |, where |T1 | represents the size of the first
tree. Otherwise if this size is greater than or equal to i, the element is in the tree, we
halve the tree recursively until to get a leaf, at this stage, we can replace the element of
this leaf with a new one.
{
{updateT ree(T1 , i, x)} ∪ S ′ : i < |T1 |
set(S, i, x) = (12.10)
{T1 } ∪ set(S ′ , i − |T1 |, x) : otherwise
1 We follow the tradition that the index i starts from 1 in algorithm description; while it starts from 0
t2
t1
x6 x5 x4 x3 x2 x1
x4 x3 x2 x1
left(t2)
x4 x3
Where S ′ = {T2 , T3 , ...} is the rest of the trees in the forest without the first one.
Function setT ree(T, i, x) performs a tree search and replace the i-th element with the
given value x.
leaf (x) : i = 0 ∧ |T | = 1
setT ree(T, i, x) = tree(|T |, setT ree(Tl , i, x), Tr ) : i < b |T2 | c
tree(|T |, Tl , setT ree(Tr , i − b |T2 | c, x)) : otherwise
(12.11)
Where Tl and Tr are left and right sub tree of T respectively. The following Haskell
program translates the equation accordingly.
setAt :: BRAList a → Int → a → BRAList a
setAt (t:ts) i x = if i < size t then (updateTree t i x):ts
else t:setAt ts (i-size t) x
As the nature of complete binary tree, for a sequence with n elements, which is repre-
sented by binary random access list, the number of trees in the forest is bound to O(lg n).
Thus it takes O(lg n) time to locate the tree for arbitrary index i, that contains the el-
ement in the worst case. The followed tree search is bound to the heights of the tree,
which is O(lg n) in the worst case as well. So the total performance of random access is
O(lg n).
Exercise 12.1
1. The random access algorithm given in this section doesn’t handle the error such as
out of bound index at all. Modify the algorithm to handle this case, and implement
it in your favorite programming language.
2. It’s quite possible to realize the binary random access list in imperative settings,
which is benefited with fast operation on the head of the sequence. the random
access can be realized in two steps: firstly locate the tree, secondly use the capability
of constant random access of array. Write a program to implement it in your favorite
imperative programming language.
In order to represent the binary random access list with binary number, we can define
two states for a bit. That Zero means there is no such a tree with size which is corre-
sponding to the bit, while One, means such tree exists in the forest. And we can attach
the tree with the state if it is One.
The following Haskell program for instance defines such states.
data Digit a = Zero
| One (Tree a)
Here we reuse the definition of complete binary tree and attach it to the state One.
Note that we cache the size information in the tree as well.
With digit defined, forest can be treated as a list of digits. Let’s see how inserting
a new element can be realized as binary number increasing. Suppose function one(t)
creates a One state and attaches tree t to it. And function getT ree(s) get the tree
which is attached to the One state s. The sequence S is a list of digits of states that
S = {s1 , s2 , ...}, and S ′ is the rest of digits with the first one removed.
{one(t)} : S = ϕ
insertT ree(S, t) = {one(t)} ∪ S ′ : s1 = Zero
{Zero} ∪ insertT ree(S ′ , link(t, getT ree(s1 ))) : otherwise
(12.12)
When we insert a new tree t to a forest S of binary digits, If the forest is empty, we
just create a One state, attach the tree to it, and make this state the only digit of the
binary number. This is just like 0 + 1 = 1;
Otherwise if the forest isn’t empty, we need examine the first digit of the binary
number. If the first digit is Zero, we just create a One state, attach the tree, and replace
the Zero state with the new created One state. This is just like (...digits...0)2 + 1 =
(...digits...1)2 . For example 6 + 1 = (110)2 + 1 = (111)2 = 7.
The last case is that the first digit is One, here we make assumption that the tree t
to be inserted has the same size with the tree attached to this One state at this stage.
This can be ensured by calling this function from inserting a leaf, so that the size of the
tree to be inserted grows in a series of 1, 2, 4, ..., 2i , .... In such case, we need link these
two trees (one is t, the other is the tree attached to the One state), and recursively insert
the linked result to the rest of the digits. Note that the previous One state has to be
replaced with a Zero state. This is just like (...digits...1)2 + 1 = (...digits′ ...0)2 , where
(...digits′ ...)2 = (...digits...)2 + 1. For example 7 + 1 = (111)2 + 1 = (1000)2 = 8
Translating this algorithm to Haskell yields the following program.
insertTree :: RAList a → Tree a → RAList a
insertTree [] t = [One t]
insertTree (Zero:ts) t = One t : ts
insertTree (One t' :ts) t = Zero : insertTree ts (link t t')
All the other functions, including link(), cons() etc. are as same as before.
Next let’s see how removing an element from a sequence can be represented as binary
number deduction. If the sequence is a singleton One state attached with a leaf. After
removal, it becomes empty. This is just like 1 − 1 = 0;
Otherwise, we examine the first digit, if it is One state, it will be replaced with a Zero
state to indicate that this tree will be no longer exist in the forest as it being removed. This
is just like (...digits...1)2 −1 = (...digits...0)2 . For example 7−1 = (111)2 −1 = (110)2 = 6;
If the first digit in the sequence is a Zero state, we have to borrow from the further
digits for removal. We recursively extract a tree from the rest digits, and halve the
extracted tree to its two children. Then the Zero state will be replaced with a One state
12.3. NUMERIC REPRESENTATION FOR BINARY RANDOM ACCESS LIST 277
attached with the right children, and the left children is removed. This is something like
(...digits...0)2 − 1 = (...digits′ ...1)2 , where (...digits′ ...)2 = (...digits)2 − 1. For example
4 − 1 = (100)2 − 1 = (11)2 = [Link] following equation illustrated this algorithm.
(t, ϕ) : S = {one(t)}
extractT ree(S) = (t, S ′ ) : s1 = one(t) (12.13)
(tl , {one(tr )} ∪ S ′′ : otherwise
Where (t′ , S ′′ ) = extractT ree(S ′ ), tl and tr are left and right sub-trees of t′ . All other
functions, including head, tail are as same as before.
Numeric representation doesn’t change the performance of binary random access list,
readers can refer to [64] for detailed discussion. Let’s take for example, analyze the average
performance (or amortized) of insertion on head algorithm by using aggregation analysis.
Considering the process of inserting n = 2m elements to an empty binary random
access list. The numeric representation of the forest can be listed as the following.
i forest (MSB ... LSB)
0 0, 0, ..., 0, 0
1 0, 0, ..., 0, 1
2 0, 0, ..., 1, 0
3 0, 0, ..., 1, 1
... ...
2m − 1 1, 1, ..., 1, 1
2m 1, 0, 0, ..., 0, 0
bits changed 1, 1, 2, ... 2m−1 . 2m
The LSB of the forest changed every time when there is a new element inserted, it
costs 2m units of computation; The next bit changes every two times due to a linking
operation, so it costs 2m−1 units; the bit next to MSB of the forest changed only one time
which links all previous trees to a big tree as the only one in the forest. This happens
at the half time of the total insertion process, and after the last element is inserted, the
MSB flips to 1.
Sum these costs up yield to the total cost T = 1 + 1 + 2 + 4 + ... + 2m−1 + 2m = 2m+1
So the average cost for one insertion is
2m+1
O(T /N ) = O( ) = O(1) (12.14)
2m
Which proves that the insertion algorithm performs in amortized O(1) constant time.
The proof for deletion are left as an exercise to the reader.
#define M sizeof(int) ∗ 8
typedef int Key;
struct List {
int n;
Key∗ tree[M];
};
Where n is the number of the elements stored in this forest. Of course we can avoid
limiting the max number of trees by using dynamic arrays, for example as the following
ISO C++ code.
template<typename Key>
struct List {
int n;
vector<vector<key> > tree;
};
i ← Number-Of-Bits(n ⊕ (n + 1))
The Number-Of-Bits process can be easily implemented with bit shifting, for ex-
ample the below ANSI C code.
int nbits(int n) {
int i=0;
while(n >≥ 1)
++i;
return i;
}
So the imperative insertion algorithm can be realized by first locating the bit which
flip from 0 to 1, then creating a new array of size 2i to represent a complete binary tree,
and moving content of all trees before this bit to this array as well as the new element to
be inserted.
function Insert(L, x)
2 The complete ISO C++ example program is available with this book.
12.3. NUMERIC REPRESENTATION FOR BINARY RANDOM ACCESS LIST 279
i ← Number-Of-Bits(n ⊕ (n + 1))
Tree(L)[i + 1] ← Create-Array(2i )
l←1
Tree(L)[i + 1][l] ← x
for j ∈ [1, i] do
for k ∈ [1, 2j ] do
l ←l+1
Tree(L)[i + 1][l] ← Tree(L)[j][k]
Tree(L)[j] ← NIL
Size(L) ← Size(L) + 1
return L
The corresponding ANSI C program is given as the following.
However, the performance in theory isn’t as good as before. This is because the linking
operation downgrades from O(1) constant time to linear array copying.
We can again calculate the average (amortized) performance by using aggregation
analysis. When insert n = 2m elements to an empty list which is represented by implicit
binary trees in arrays, the numeric presentation of the forest of arrays are as same as
before except for the cost of bit flipping.
i forest (MSB ... LSB)
0 0, 0, ..., 0, 0
1 0, 0, ..., 0, 1
2 0, 0, ..., 1, 0
3 0, 0, ..., 1, 1
... ...
2m − 1 1, 1, ..., 1, 1
2m 1, 0, 0, ..., 0, 0
bit change cost 1×2 , 1×2
m m−1
, 2×2m−2
, ... 2 m−2
× 2, 2m−1 × 1
The LSB of the forest changed every time when there is a new element inserted,
however, it creates leaf tree and performs copying only it changes from 0 to 1, so the cost
is half of n unit, which is 2m−1 ; The next bit flips as half as the LSB. Each time the bit
gets flipped to 1, it copies the first tree as well as the new element to the second tree. the
the cost of flipping a bit to 1 in this bit is 2 units, but not 1; For the MSB, it only flips
to 1 at the last time, but the cost of flipping this bit, is copying all the previous trees to
fill the array of size 2m .
Summing all to cost and distributing them to the n times of insertion yields the
280 CHAPTER 12. SEQUENCES, THE LAST BRICK
The imperative removal and random mutating algorithms are left as exercises to the
reader.
Exercise 12.2
1. Please implement the random access algorithms, including looking up and updating,
for binary random access list with numeric representation in your favorite program-
ming language.
2. Prove that the amortized performance of deletion is O(1) constant time by using
aggregation analysis.
3. Design and implement the binary random access list by implicit array in your fa-
vorite imperative programming language.
Figure 12.6: A paired-array list, which is consist of 2 arrays linking in head-head manner.
fast random access. It can be also used to realize a fast random access sequence in
imperative setting.
Figure 12.6 shows the design of paired-array list. Two arrays are linked in head-head
manner. To insert a new element on the head of the sequence, the element is appended at
the end of front array; To append a new element on the tail of the sequence, the element
is appended at the end of rear array;
Here is a ISO C++ code snippet to define the this data structure.
template<typename Key>
struct List {
int n, m;
vector<Key> front;
vector<Key> rear;
Here we use vector provides in standard library to cover the dynamic memory man-
agement issues, so that we can concentrate on the algorithm design.
function Append(L, x)
R ← Rear(L)
Size(R) ← Size(R) + 1
R[Size(R)] ← x
As all the above operations manipulate the front and rear array on tail, they are all
constant O(1) time. And the following are the corresponding ISO C++ programs.
template<typename Key>
void insert(List<Key>& xs, Key x) {
++xs.n;
[Link].push_back(x);
}
template<typename Key>
void append(List<Key>& xs, Key x) {
++xs.m;
[Link].push_back(x);
}
282 CHAPTER 12. SEQUENCES, THE LAST BRICK
With Balance algorithm defined, it’s trivial to implement remove algorithm both on
head and on tail.
function Remove-Head(L)
Balance(L)
F ← Front(L)
if F = ϕ then
Remove-Tail(L)
else
Size(F ) ← Size(F ) - 1
function Remove-Tail(L)
Balance(L)
R ← Rear(L)
if R = ϕ then
Remove-Head(L)
else
Size(R) ← Size(R) - 1
There is an edge case for each, that is even after balancing, the array targeted to
perform removal is still empty. This happens that there is only one element stored in the
paired-array list. The solution is just remove this singleton left element, and the overall
list results empty. Below is the ISO C++ program implements this algorithm.
template<typename Key>
void remove_head(List<Key>& xs) {
balance(xs);
if([Link]())
remove_tail(xs); //remove the singleton elem in rear
else {
[Link].pop_back();
--xs.n;
}
}
template<typename Key>
void remove_tail(List<Key>& xs) {
balance(xs);
if([Link]())
remove_head(xs); //remove the singleton elem in front
else {
[Link].pop_back();
--xs.m;
}
}
It’s obvious that the worst case performance is O(n) where n is the number of elements
284 CHAPTER 12. SEQUENCES, THE LAST BRICK
stored in paired-array list. This happens when balancing is triggered, and both reverse
and shifting are linear operation. However, the amortized performance of removal is still
O(1), the proof is left as exercise to the reader.
Exercise 12.3
3. Prove that the amortized performance of removal is O(1) for paired-array list.
Where function cons, head and tail are defined in previous section.
If the length of the two sequence is n, and m, this method takes O(N lg n) time
repeatedly push all elements from the first sequence to stacks, and then takes Ω(n lg(n +
m)) to insert the elements in front of the second sequence. Note that Ω means the upper
limit, There is detailed definition for it in [4].
We have already implemented the real-time queue in previous chapter. It supports
O(1) time pop and push. If we can turn the sequence concatenation to a kind of pushing
operation to queue, the performance will be improved to O(1) as well. Okasaki gave such
realization in [3], which can concatenate lists in constant time.
To represent a concatenate-able list, the data structure designed by Okasaki is essen-
tially a K-ary tree. The root of the tree stores the first element in the list. So that we can
access it in constant O(1) time. The sub-trees or children are all small concatenate-able
lists, which are managed by real-time queues. Concatenating another list to the end is
just adding it as the last child, which is in turn a queue pushing operation. Appending a
new element can be realized as that, first wrapping the element to a singleton tree, which
is a leaf with no children. Then, concatenate this singleton to finalize appending.
Figure 12.7 illustrates this data structure.
Such recursively designed data structure can be defined in the following Haskell code.
data CList a = Empty | CList a (Queue (CList a))
12.5. CONCATENATE-ABLE LIST 285
x[1]
It means that a concatenate-able list is either empty or a K-ary tree, which again
consists of a queue of concatenate-able sub-lists and a root element. Here we reuse the
realization of real-time queue mentioned in previous chapter.
Suppose function clist(x, Q) constructs a concatenate-able list from an element x, and
a queue of sub-lists Q. While function root(s) returns the root element of such K-ary tree
implemented list. and function queue(s) returns the queue of sub-lists respectively. We
can implement the algorithm to concatenate two lists like this.
s1 : s2 = ϕ
concat(s1 , s2 ) = s2 : s1 = ϕ (12.17)
clist(x, push(Q, s2 )) : otherwise
Besides the good performance of concatenation, this design also brings satisfied fea-
tures for adding element both on head and tail.
Getting the first element is just returning the root of the K-ary tree.
It’s a bit complex to realize the algorithm that removes the first element from a
concatenate-able list. This is because after the root, which is the first element in the
286 CHAPTER 12. SEQUENCES, THE LAST BRICK
sequence got removed, we have to re-construct the rest things, a queue of sub-lists, to a
K-ary tree.
After the root being removed, there left all children of the K-ary tree. Note that all of
them are also concatenate-able list, so that one natural solution is to concatenate them
all together to a big list.
{
ϕ : Q=ϕ
concatAll(Q) = (12.21)
concat(f ront(Q), concatAll(pop(Q))) : otherwise
Where function f ront just returns the first element from a queue without removing
it, while pop does the removing work.
If the queue is empty, it means that there is no children at all, so the result is also an
empty list; Otherwise, we pop the first child, which is a concatenate-able list, from the
queue, and recursively concatenate all the rest children to a list; finally, we concatenate
this list behind the already popped first children.
With concatAll defined, we can then implement the algorithm of removing the first
element from a list as below.
Function isEmptyQ is used to test a queue is empty, it is trivial and we omit its
definition. Readers can refer to the source code along with this book.
linkAll algorithm actually traverses the queue data structure, and reduces to a final
result. This remind us of folding mentioned in the chapter of binary search tree. readers
can refer to the appendix of this book for the detailed description of folding. It’s quite
possible to define a folding algorithm for queue instead of list3 [10].
{
e : Q=ϕ
f oldQ(f, e, Q) = (12.23)
f (f ront(Q), f oldQ(f, e, pop(Q))) : otherwise
Function f oldQ takes three parameters, a function f , which is used for reducing, an
initial value e, and the queue Q to be traversed.
Here are some examples to illustrate folding on queue. Suppose a queue Q contains
elements {1, 2, 3, 4, 5} from head to tail.
f oldQ(+, 0, Q) = 1 + (2 + (3 + (4 + (5 + 0)))) = 15
f oldQ(×, 1, Q) = 1 × (2 × (3 × (4 × (5 × 1)))) = 120
f oldQ(×, 0, Q) = 1 × (2 × (3 × (4 × (5 × 0)))) = 0
foldQ :: (a → b → b) → b → Queue a → b
foldQ f z q | isEmptyQ q = z
| otherwise = (front q) `f` foldQ f z (pop q)
However, the performance of removing can’t be ensured in all cases. The worst case
is that, user keeps appending n elements to a empty list, and then immediately performs
removing. At this time, the K-ary tree has the first element stored in root. There are
n − 1 children, all are leaves. So linkAll() algorithm downgrades to O(n) which is linear
time.
Considering the add, append, concatenate and removing operations are randomly
performed. The average case is amortized O(1), The proof is left as en exercise to the
reader.
Exercise 12.4
1. Can you figure out a solution to append an element to the end of a binary random
access list?
• In order to support fast manipulation both on head and tail of the sequence, there
must be some way to easily access the head and tail position;
• Tree like data structure helps to turn the random access into divide and conquer
search, if the tree is well balance, the search can be ensured to be logarithm time.
12.6.1 Definition
Finger tree[66], which was first invented in 1977, can help to realize efficient sequence.
And it is also well implemented in purely functional settings[65].
As we mentioned that the balance of the tree is critical to ensure the performance for
search. One option is to use balanced tree as the under ground data structure for finger
tree. For example the 2-3 tree, which is a special B-tree. (readers can refer to the chapter
of B-tree of this book).
A 2-3 tree either contains 2 children or 3. It can be defined as below in Haskell.
288 CHAPTER 12. SEQUENCES, THE LAST BRICK
In imperative settings, node can be defined with a list of sub nodes, which contains
at most 3 children. For instance the following ANSI C code defines node.
union Node {
Key∗ keys;
union Node∗ children;
};
Note in this definition, a node can either contain 2 ∼ 3 keys, or 2 ∼ 3 sub nodes.
Where key is the type of elements stored in leaf node.
We mark the left-most none-leaf node as the front finger (or left finger) and the right-
most none-leaf node as the rear finger (or right finger). Since both fingers are essentially
2-3 trees with all leafs as children, they can be directly represented as list of 2 or 3 leafs.
Of course a finger tree can be empty or contain only one element as leaf.
So the definition of a finger tree is specified like this.
• or a singleton leaf;
• or contains three parts: a left finger which is a list contains at most 3 elements; a
sub finger tree; and a right finger which is also a list contains at most 3 elements.
Note that this definition is recursive, so it’s quite possible to be translated to functional
settings. The following Haskell definition summaries these cases for example.
data Tree a = Empty
| Lf a
| Tr [a] (Tree (Node a)) [a]
In imperative settings, we can define the finger tree in a similar manner. What’s more,
we can add a parent field, so that it’s possible to back-track to root from any tree node.
Below ANSI C code defines finger tree accordingly.
struct Tree {
union Node∗ front;
union Node∗ rear;
Tree∗ mid;
Tree∗ parent;
};
We can use NIL pointer to represent an empty tree; and a leaf tree contains only one
element in its front finger, both its rear finger and middle part are empty.
Figure 12.8 and 12.9 show some examples of figure tree.
The first example is an empty finger tree; the second one shows the result after in-
serting one element to empty, it becomes a leaf of one node; the third example shows a
finger tree contains 2 elements, one is in front finger, the other is in rear.
If we continuously insert new elements, to the tree, those elements will be put in the
front finger one by one, until it exceeds the limit of 2-3 tree. The 4-th example shows
such condition, that there are 4 elements in front finger, which isn’t balanced any more.
The last example shows that the finger tree gets fixed so that it resumes balancing.
There are two elements in the front finger. Note that the middle part is not empty any
longer. It’s a leaf of a 2-3 tree (why it’s a leaf is explained later). The content of the leaf
is a tree with 3 branches, each contains an element.
We can express these 5 examples as the following Haskell expression.
12.6. FINGER TREE 289
NIL
a
b NIL a
f e a
e d c b NIL a
d c b
Empty
Lf a
[b] Empty [a]
[e, d, c, b] Empty [a]
[f, e] Lf (Br3 d c b) [a]
In the last example, why the middle part inner tree is a leaf? As we mentioned that
the definition of finger tree is recursive. The middle part besides the front and rear finger
is a deeper finger tree, which is defined as T ree(N ode(a)). Every time we go deeper,
the N ode is embedded one more level. if the element type of the first level tree is a, the
element type for the second level tree is N ode(a), the third level is N ode(N ode(a)), ...,
the n-th level is N ode(N ode(N ode(...(a))...)) = N oden (a), where n indicates the N ode is
applied n times.
Here we use the LISP naming convention to illustrate inserting a new element to a
list.
The insertion algorithm can also be implemented in imperative approach. Suppose
function Tree() creates an empty tree, that all fields, including front and rear finger, the
middle part inner tree and parent are empty. Function Node() creates an empty node.
function Prepend-Node(n, T )
r ← Tree()
p←r
Connect-Mid(p, T )
while Full?(Front(T )) do
F ← Front(T ) ▷ F = {n1 , n2 , n3 , ...}
Front(T ) ← {n, F [1]} ▷ F [1] = n1
n ← Node()
Children(n) ← F [2..] ▷ F [2..] = {n2 , n3 , ...}
p←T
T ← Mid(T )
if T = NIL then
T ← Tree()
Front(T )← {n}
else if | Front(T ) | = 1 ∧ Rear(T ) = ϕ then
Rear(T ) ← Front(T )
Front(T ) ← {n}
else
Front(T ) ← {n}∪ Front(T )
Connect-Mid(p, T ) ← T
return Flat(r)
Where the notation L[i..] means a sub list of L with the first i − 1 elements removed,
that if L = {a1 , a2 , ..., an }, then L[i..] = {ai , ai+1 , ..., an }.
Functions Front, Rear, Mid, and Parent are used to access the front finger, the
rear finger, the middle part inner tree and the parent tree respectively; Function Chil-
dren accesses the children of a node.
Function Connect-Mid(T1 , T2 ), connect T2 as the inner middle part tree of T1 , and
set the parent of T2 as T1 if T2 isn’t empty.
In this algorithm, we performs a one pass top-down traverse along the middle part
inner tree if the front finger is full that it can’t afford to store any more. The criteria for
full for a 2-3 tree is that the finger contains 3 elements already. In such case, we extract
all the elements except the first one off, wrap them to a new node (one level deeper node),
and continuously insert this new node to its middle inner tree. The first element is left in
the front finger, and the element to be inserted is put in front of it, so that this element
becomes the new first one in the front finger.
After this traversal, the algorithm either reach an empty tree, or the tree still has
room to hold more element in its front finger. We create a new leaf for the former case,
and perform a trivial list insert to the front finger for the latter.
During the traversal, we use p to record the parent of the current tree we are processing.
So any new created tree are connected as the middle part inner tree to p.
292 CHAPTER 12. SEQUENCES, THE LAST BRICK
Finally, we return the root of the tree r. The last trick of this algorithm is the Flat
function. In order to simplify the logic, we create an empty ‘ground’ tree and set it as
the parent of the root. We need eliminate this extra ‘ground’ level before return the root.
This flatten algorithm is realized as the following.
function Flat(T )
while T 6= NIL ∧T is empty do
T ← Mid(T )
if T 6= NIL then
Parent(T ) ← NIL
return T
The while loop test if T is trivial empty, that it’s not NIL(= ϕ), while both its front
and rear fingers are empty.
Below Python code implements the insertion algorithm for finger tree.
def flat(t):
while t is not None and [Link]():
t = [Link]
if t is not None:
[Link] = None
return t
It’s easy to implement the reverse operation that remove the first element from the list
by reversing the insertT () algorithm line by line.
Let’s denote F = {f1 , f2 , ...} is the front finger list, M is the middle part inner finger
tree. R = {r1 , r2 , ...} is the rear finger list of a finger tree, and R′ = {r2 , r3 , ...} is the rest
12.6. FINGER TREE 293
Here we skip the error handling such as trying to remove element from empty tree
etc. If the finger tree is a leaf, the result after removal is an empty tree; If the finger tree
contains two elements, one in the front finger, the other in rear, we return the element
stored in front finger as the first element, and the resulted tree after removal is a leaf; If
there is only one element in front finger, the middle part inner tree is empty, and the rear
finger isn’t empty, we return the only element in front finger, and borrow one element
from the rear finger to front; If there is only one element in front finger, however, the
middle part inner tree isn’t empty, we can recursively remove a node from the inner tree,
and flatten it to a plain list to replace the front finger, and remove the original only
element in front finger; The last case says that if the front finger contains more than one
element, we can just remove the first element from front finger and keep all the other part
unchanged.
Figure 12.10 shows the steps of removing two elements from the head of a sequence.
There are 10 elements stored in the finger tree. When the first element is removed, there
is still one element left in the front finger. However, when the next element is removed,
the front finger is empty. So we ‘borrow’ one tree node from the middle part inner tree.
This is a 2-3 tree. it is converted to a list of 3 elements, and the list is used as the
new finger. the middle part inner tree change from three parts to a singleton leaf, which
contains only one 2-3 tree node. There are three elements stored in this tree node.
Below is the corresponding Haskell program for ‘uncons’.
uncons :: Tree a → (a, Tree a)
uncons (Lf a) = (a, Empty)
uncons (Tr [a] Empty [b]) = (a, Lf b)
uncons (Tr [a] Empty (r:rs)) = (a, Tr [r] Empty rs)
uncons (Tr [a] m r) = (a, Tr (nodeToList f) m' r) where (f, m') = uncons m
uncons (Tr f m r) = (head f, Tr (tail f) m r)
Similar as above, we can define head and tail function from uncons.
head = fst ◦ uncons
tail = snd ◦ uncons
NIL
NIL
from the middle part inner tree. However there exists cases that the tree is ill-formed,
for example, both the front fingers of the tree and its middle part inner tree are empty.
Such ill-formed tree can result from imperatively splitting, which we’ll introduce later.
...
...
Figure 12.11: Example of an ill-formed tree. The front finger of the i-th level sub tree
isn’t empty.
Here we developed an imperative algorithm which can remove the first element from
finger tree even it is ill-formed. The idea is first perform a top-down traverse to find a sub
tree which either has a non-empty front finger or both its front finger and middle part
inner tree are empty. For the former case, we can safely extract the first element which
is a node from the front finger; For the latter case, since only the rear finger isn’t empty,
we can swap it with the empty front finger, and change it to the former case.
After that, we need examine the node we extracted from the front finger is leaf node
(How to do that? this is left as an exercise to the reader). If not, we need go on extracting
the first sub node from the children of this node, and left the rest of other children as the
new front finger to the parent of the current tree. We need repeatedly go up along with
the parent field till the node we extracted is a leaf. At that time point, we arrive at the
root of the tree. Figure 12.12 illustrates this process.
Based on this idea, the following algorithm realizes the removal operation on head.
The algorithm assumes that the tree passed in isn’t empty.
function Extract-Head(T )
r ← Tree()
Connect-Mid(r, T )
while Front(T ) = ϕ∧ Mid(T ) 6= NIL do
T ← Mid(T )
if Front(T ) = ϕ∧ Rear(T ) 6= ϕ then
Exchange Front(T ) ↔ Rear(T )
n ← Node()
Children(n) ← Front(T )
repeat
L ← Children(n) ▷ L = {n1 , n2 , n3 , ...}
n ← L[1] ▷ n ← n1
Front(T ) ← L[2..] ▷ L[2..] = {n2 , n3 , ...}
T ← Parent(T )
296 CHAPTER 12. SEQUENCES, THE LAST BRICK
1 x[1] is extracted 1
... ...
i-1 i-1
children of n[i][1]= n[i-1][1] n[i-1][2] ... r[i-1][1] r[i-1][2] ... n[i-1][2] n[i-1][3] ... r[i-1][1] r[i-1][2] ...
i i
n[i][2] ... r[i][1] r[i][2] ... n[i][2] ... r[i][1] r[i][2] ...
... ...
(a) Extract the first element n[i][1] and put its (b) Repeat this process i times, and finally
children to the front finger of upper level tree. x[1] is extracted.
Member function [Link]() returns true if both the front finger and the rear
finger are empty. We put a flag [Link] to mark if a node is a leaf or compound
node. The exercise of this section asks the reader to consider some alternatives.
As the ill-formed tree is allowed, the algorithms to access the first and last element of
the finger tree must be modified, so that they don’t blindly return the first or last child
of the finger as the finger can be empty if the tree is ill-formed.
The idea is quite similar to the Extract-Head, that in case the finger is empty while
the middle part inner tree isn’t, we need traverse along with the inner tree till a point that
either the finger becomes non-empty or all the nodes are stored in the other finger. For
instance, the following algorithm can return the first leaf node even the tree is ill-formed.
function First-Lf(T )
while Front(T ) = ϕ∧ Mid(T ) 6= NIL do
T ← Mid(T )
if Front(T ) = ϕ∧ Rear(T ) 6= ϕ then
n ← Rear(T )[1]
else
n ← Front(T )[1]
while n is NOT leaf do
n ← Children(n)[1]
return n
Note the second loop in this algorithm that it keeps traversing on the first sub-node if
current node isn’t a leaf. So we always get a leaf node and it’s trivial to get the element
inside it.
function First(T )
return Elem(First-Lf(T ))
The following Python code translates the algorithm to real program.
def first(t):
return elem(first_leaf(t))
def first_leaf(t):
while [Link] == [] and [Link] is not None:
t = [Link]
if [Link] == [] and [Link] ̸= []:
n = [Link][0]
else:
n = [Link][0]
while not [Link]:
n = [Link][0]
return n
To access the last element is quite similar, and we left it as an exercise to the reader.
we break the rear finger, take the first 3 elements in rear finger to create a new 2-3 tree,
and recursively append it to the middle part inner tree. If the finger tree is empty or a
singleton leaf, it will be handled in the first two cases.
Translating the equation to Haskell yields the below program.
Function name snoc is mirror of cons, which indicates the symmetric relationship.
Appending new element to the end imperatively is quite similar. The following algo-
rithm realizes appending.
function Append-Node(T, n)
r ← Tree()
p←r
Connect-Mid(p, T )
while Full?(Rear(T )) do
R ← Rear(T ) ▷ R = {n1 , n2 , ..., , nm−1 , nm }
Rear(T ) ← {n, Last(R) } ▷ last element nm
n ← Node()
Children(n) ← R[1...m − 1] ▷ {n1, n2, ..., nm−1 }
p←T
T ← Mid(T )
if T = NIL then
T ← Tree()
Front(T ) ← {n}
else if | Rear(T ) | = 1 ∧ Front(T ) = ϕ then
Front(T ) ← Rear(T )
Rear(T ) ← {n}
else
Rear(T ) ← Rear(T ) ∪{n}
Connect-Mid(p, T ) ← T
return Flat(r)
And the corresponding Python programs is given as below.
And we can define a special function last and init for finger tree which is similar
to their counterpart for list.
last = snd ◦ unsnoc
init = fst ◦ unsnoc
Imperatively removing the element from the end is almost as same as removing on
the head. Although there seems to be a special case, that as we always store the only
element (or sub node) in the front finger while the rear finger and middle part inner tree
are empty (e.g. T ree({n}, N IL, ϕ)), it might get nothing if always try to fetch the last
element from rear finger.
This can be solved by swapping the front and the rear finger if the rear is empty as
in the following algorithm.
function Extract-Tail(T )
r ← Tree()
Connect-Mid(r, T )
while Rear(T ) = ϕ∧ Mid(T ) 6= NIL do
T ← Mid(T )
if Rear(T ) = ϕ∧ Front(T ) 6= ϕ then
Exchange Front(T ) ↔ Rear(T )
n ← Node()
Children(n) ← Rear(T )
repeat
L ← Children(n) ▷ L = {n1 , n2 , ..., nm−1 , nm }
n ← Last(L) ▷ n ← nm
Rear(T ) ← L[1...m − 1] ▷ {n1 , n2 , ..., nm−1 }
T ← Parent(T )
300 CHAPTER 12. SEQUENCES, THE LAST BRICK
12.6.7 concatenate
Consider the none-trivial case that concatenate two finger trees T1 = tree(F1 , M1 , R1 )
and T2 = tree(F2 , M2 , R2 ). One natural idea is to use F1 as the new front finger for the
concatenated result, and keep R2 being the new rear finger. The rest of work is to merge
M1 , R1 , F2 and M2 to a new middle part inner tree.
Note that both R1 and F2 are plain lists of node, so the sub-problem is to realize a
algorithm like this.
merge(M1 , R1 ∪ F2 , M2 ) =?
More observation reveals that both M1 and M2 are also finger trees, except that they
are one level deeper than T1 and T2 in terms of N ode(a), where a is the type of element
stored in the tree. We can recursively use the strategy that keep the front finger of M1
and the rear finger of M2 , then merge the middle part inner tree of M1 , M2 , as well as
the rear finger of M1 and front finger of M2 .
If we denote function f ront(T ) returns the front finger, rear(T ) returns the rear finger,
mid(T ) returns the middle part inner tree. the above merge algorithm can be expressed
for non-trivial case as the following.
merge(M1 , R1 ∪ F2 , M2 ) = tree(f ront(M1 ), S, rear(M2 ))
(12.30)
S = merge(mid(M1 ), rear(M1 ) ∪ R1 ∪ F2 ∪ f ront(M2 ), mid(M2 ))
If we look back to the original concatenate solution, it can be expressed as below.
concat(T1 , T2 ) = tree(F1 , merge(M1 , R1 ∪ F2 , M2 ), R2 ) (12.31)
And compare it with equation 12.30, it’s easy to note the fact that concatenating is
essentially merging. So we have the final algorithm like this.
concat(T1 , T2 ) = merge(T1 , ϕ, T2 ) (12.32)
By adding edge cases, the merge() algorithm can be completed as below.
f oldR(insertT, T2 , S) : T1 = ϕ
f oldL(appendT, T1 , S) : T2 = ϕ
merge(T1 , S, T2 ) = merge(ϕ, {x} ∪ S, T2 ) : T1 = leaf (x)
merge(T 1 , S ∪ {x}, ϕ) : T2 = leaf (x)
tree(F1 , merge(M1 , nodes(R1 ∪ S ∪ F2 ), M 2), R2 ) : otherwise
(12.33)
Most of these cases are straightforward. If any one of T1 or T2 is empty, the algorithm
repeatedly insert/append all elements in S to the other tree; Function f oldL and f oldR
are kinds of for-each process in imperative settings. The difference is that f oldL processes
the list S from left to right while f oldR processes from right to left.
Here are their definition. Suppose list L = {a1 , a2 , ..., an−1 , an }, L′ = {a2 , a3 , ..., an−1 , an }
is the rest of elements except for the first one.
{
e : L=ϕ
f oldL(f, e, L) = (12.34)
f oldL(f, f (e, a1 ), L′ ) : otherwise
12.6. FINGER TREE 301
{
e : L=ϕ
f oldR(f, e, L) = (12.35)
f (a1 , f oldR(f, e, L′ )) : otherwise
Function nodes follows the constraint of 2-3 tree, that if there are only 2 or 3 elements
in the list, it just wrap them in singleton list contains a 2-3 tree; If there are 4 elements
in the lists, it split them into two trees each is consist of 2 branches; Otherwise, if there
are more elements than 4, it wraps the first three in to one tree with 3 branches, and
recursively call nodes to process the rest.
The performance of concatenation is determined by merging. Analyze the recursive
case of merging reveals that the depth of recursion is proportion to the smaller height of
the two trees. As the tree is ensured to be balanced by using 2-3 tree. it’s height is bound
to O(lg n′ ) where n′ is the number of elements. The edge case of merging performs as
same as insertion, (It calls insertT at most 8 times) which is amortized O(1) time, and
O(lg m) at worst case, where m is the difference in height of the two trees. So the overall
performance is bound to O(lg n), where n is the total number of elements contains in two
finger trees.
The following Haskell program implements the concatenation algorithm.
concat :: Tree a → Tree a → Tree a
concat t1 t2 = merge t1 [] t2
Note that there is concat function defined in prelude standard library, so we need
distinct them either by hiding import or take a different name.
merge :: Tree a → [a] → Tree a → Tree a
merge Empty ts t2 = foldr cons t2 ts
merge t1 ts Empty = foldl snoc t1 ts
merge (Lf a) ts t2 = merge Empty (a:ts) t2
merge t1 ts (Lf a) = merge t1 (ts++[a]) Empty
merge (Tr f1 m1 r1) ts (Tr f2 m2 r2) = Tr f1 (merge m1 (nodes (r1 ++ ts ++ f2)) m2) r2
every iteration, we create a new tree T , choose the front finger of T1 as the front finger
of T ; and choose the rear finger of T2 as the rear finger of T . The other two fingers (rear
finger of T1 and front finger of T2 ) are put together as a list, and this list is then balanced
grouped to several 2-3 tree nodes as N . Note that N grows along with traversing not
only in terms of length, the depth of its elements increases by one in each iteration. We
attach this new tree as the middle part inner tree of the upper level result tree to end
this iteration.
Once either tree becomes empty, we stop traversing, and repeatedly insert the 2-3 tree
nodes in N to the other non-empty tree, and set it as the new middle part inner tree of
the upper level result.
Below algorithm describes this process in detail.
function Concat(T1 , T2 )
return Merge(T1 , ϕ, T2 )
function Merge(T1 , N, T2 )
r ← Tree()
p←r
while T1 6= NIL ∧T2 6= NIL do
T ← Tree()
Front(T ) ← Front(T1 )
Rear(T ) ← Rear(T2 )
Connect-Mid(p, T )
p←T
N ← Nodes(Rear(T1 ) ∪n∪ Front(T2 ))
T1 ← Mid(T1 )
T2 ← Mid(T2 )
if T1 = NIL then
T ← T2
for each n ∈ Reverse(N ) do
T ← Prepend-Node(n, T )
else if T2 = NIL then
T ← T1
for each n ∈ N do
T ← Append-Node(T, n)
Connect-Mid(p, T )
return Flat(r)
Note that the for-each loops in the algorithm can also be replaced by folding from left
and right respectively. Translating this algorithm to Python program yields the below
code.
def concat(t1, t2):
return merge(t1, [], t2)
elif t2 is None:
prev.set_mid(reduce(append_node, ns, t1))
return flat(root)
Because Python only provides folding function from left as reduce(), a folding func-
tion from right is given like what we shown in the following code, that it repeatedly applies
function in reverse order of the list.
def foldR(f, xs, z):
for x in reversed(xs):
z = f(x, z)
return z
The only function in question is how to balanced-group nodes to bigger 2-3 trees. As
a 2-3 tree can hold at most 3 sub trees, we can firstly take 3 nodes and wrap them to
a ternary tree if there are more than 4 nodes in the list and continuously deal with the
rest. If there are just 4 nodes, they can be wrapped to two binary trees. For other cases
(there are 3 nodes, 2 nodes, 1 node), we simply wrap them all to a tree.
Denote node list L = {n1 , n2 , ...}, The following algorithm realizes this process.
function Nodes(L)
N =ϕ
while |L| > 4 do
n ← Node()
Children(n) ← L[1..3] ▷ {n1 , n2 , n3 }
N ← N ∪ {n}
L ← L[4...] ▷ {n4 , n5 , ...}
if |L| = 4 then
x ← Node()
Children(x) ← {L[1], L[2]}
y ← Node()
Children(y) ← {L[3], L[4]}
N ← N ∪ {x, y}
else if L 6= ϕ then
n ← Node()
Children(n) ← L
N ← N ∪ {n}
return N
It’s straight forward to translate the algorithm to below Python program. Where
function wraps() helps to create an empty node, then set a list as the children of this
node.
def nodes(xs):
res = []
while len(xs) > 4:
[Link](wraps(xs[:3]))
xs = xs[3:]
if len(xs) == 4:
[Link](wraps(xs[:2]))
[Link](wraps(xs[2:]))
elif xs ̸= []:
[Link](wraps(xs))
return res
Exercise 12.5
304 CHAPTER 12. SEQUENCES, THE LAST BRICK
1. Implement the complete finger tree insertion program in your favorite imperative
programming language. Don’t check the example programs along with this chapter
before having a try.
2. How to determine a node is a leaf? Does it contain only a raw element inside
or a compound node, which contains sub nodes as children? Note that we can’t
distinguish it by testing the size, as there is case that node contains a singleton leaf,
such as node(1, {node(1, {x}}). Try to solve this problem in both dynamic typed
language (e.g. Python, lisp etc) and in strong static typed language (e.g. C++).
4. Realize algorithm to return the last element of a finger tree in both functional and
imperative approach. The later one should be able to handle ill-formed tree.
5. Try to implement concatenation algorithm without using folding. You can either
use recursive methods, or use imperative for-each method.
Suppose the function tree(s, F, M, R) creates a finger tree from size s, front finger F ,
rear finger R, and middle part inner tree M . When the size of the tree is needed, we can
call a size(T ) function. It will be something like this.
0 : T =ϕ
size(T ) = ? : T = leaf (x)
s : T = tree(s, F, M, R)
If the tree is empty, the size is definitely zero; and if it can be expressed as tree(s, F, M, R),
the size is s; however, what if the tree is a singleton leaf? is it 1? No, it can be 1 only
if T = leaf (a) and a isn’t a tree node, but a raw element stored in finger tree. In most
cases, the size is not 1, because a can be again a tree node. That’s why we put a ‘?’ in
above equation.
12.6. FINGER TREE 305
The correct way is to call some size function on the tree node as the following.
0 : T =ϕ
size(T ) = size′ (x) : T = leaf (x) (12.37)
s : T = tree(s, F, M, R)
Note that this isn’t a recursive definition since size 6= size′ , the argument to size′ is
either a tree node, which is a 2-3 tree, or a plain element stored in the finger tree. To
uniform these two cases, we can anyway wrap the single plain element to a tree node of
only one element. So that we can express all the situation as a tree node augmented with
a size field. The following Haskell program modifies the definition of tree node.
data Node a = Br Int [a]
We change it from union to structure. Although there is a overhead field ‘key’ if the
node isn’t a leaf.
Suppose function tr(s, L), creates such a node (either one element being wrapped or
a 2-3 tree) from a size information s, and a list L. Here are some example.
So the function size′ can be implemented as returning the size information of a tree
node. We have size′ (tr(s, L)) = s.
Wrapping an element x is just calling tr(1, {x}). We can define auxiliary functions
wrap and unwrap, for instance.
As both front finger and rear finger are lists of tree nodes, in order to calculate the
total size of finger, we can provide a size′′ (L) function, which sums up size of all nodes
stored in the list. Denote L = {a1 , a2 , ...} and L′ = {a2 , a3 , ...}.
{
0 : L=ϕ
size′′ (L) = (12.39)
size′ (a1 ) + size′′ (L′ ) : otherwise
It’s quite OK to define size′′ (L) by using some high order functions. For example.
And we can turn a list of tree nodes into one deeper 2-3 tree and vice-versa.
size (Br s _) = s
sizeT Empty = 0
sizeT (Lf a) = size a
sizeT (Tr s _ _ _) = s
As NIL is typically used to represent empty tree in imperative settings, it’s convenient
to provide a auxiliary size function to uniformed calculate the size of tree no matter it is
NIL.
function Size-Tr(T )
if T = NIL then
return 0
else
return Size(T )
The algorithm is trivial and we skip its implementation example program.
a front finger, a middle part inner tree and a rear finger. This function should also be
modified to add size information of these three arguments.
f romL(F ) : M = ϕ ∧ R = ϕ
f romL(R) : M = ϕ ∧ F = ϕ
tree′ (F, M, R) = tree′ (unwraps(F ′ ), M ′ , R) : F = ϕ, (F ′ , M ′ ) = extractT ′ (M
tree′ (F, M ′ , unwraps(R′ )) : R = ϕ, (M ′ , R′ ) = removeT ′ (M
tree(size (F ) + size(M ) + size′′ (R), F, M, R) : otherwise
′′
(12.43)
Where f romL() helps to turn a list of nodes to a finger tree by repeatedly inserting
all the element one by one to an empty tree.
f romL(L) = f oldR(insertT ′ , ϕ, L)
The similar modification for augment size should also be tuned for imperative algo-
rithms, for example, when a new node is prepend to the head of the finger tree, we should
update size when traverse the tree.
function Prepend-Node(n, T )
r ← Tree()
p←r
Connect-Mid(p, T )
while Full?(Front(T )) do
308 CHAPTER 12. SEQUENCES, THE LAST BRICK
F ← Front(T )
Front(T ) ← {n, F [1]}
Size(T ) ← Size(T ) + Size(n) ▷ update size
n ← Node()
Children(n) ← F [2..]
p←T
T ← Mid(T )
if T = NIL then
T ← Tree()
Front(T )← {n}
else if | Front(T ) | = 1 ∧ Rear(T ) = ϕ then
Rear(T ) ← Front(T )
Front(T ) ← {n}
else
Front(T ) ← {n}∪ Front(T )
Size(T ) ← Size(T ) + Size(n) ▷ update size
Connect-Mid(p, T ) ← T
return Flat(r)
The corresponding Python code are modified accordingly as below.
def prepend_node(n, t):
root = prev = Tree()
prev.set_mid(t)
while frontFull(t):
f = [Link]
[Link] = [n] + f[:1]
[Link] = [Link] + [Link]
n = wraps(f[1:])
prev = t
t = [Link]
if t is None:
t = leaf(n)
elif len([Link])==1 and [Link] == []:
t = Tree([Link] + [Link], [n], None, [Link])
else:
t = Tree([Link] + [Link], [n]+[Link], [Link], [Link])
prev.set_mid(t)
return flat(root)
Note that the tree constructor is also modified to take a size argument as the first
parameter. And the leaf helper function does not only construct the tree from a node,
but also set the size of the tree with the same size of the node inside it.
For simplification purpose, we skip the detailed description of what are modified in
extractT , appendT , removeT , and concat algorithms. They are left as exercises to the
reader.
With size information augmented, it’s easy to locate a node at given position by perform-
ing a tree search. What’s more, as the finger tree is constructed from three part F , M ,
and R; and it’s nature of recursive, it’s also possible to split it into three sub parts with
a given position i: the left, the node at i, and the right part.
The idea is straight forward. Since we have the size information for F , M , and R.
Denote these three sizes as Sf , Sm , and Sr . if the given position i ≤ Sf , the node must be
stored in F , we can go on seeking the node inside F ; if Sf < i ≤ Sf + Sm , the node must
12.6. FINGER TREE 309
Splitting a leaf results both the left and right parts empty, the node stored in leaf is
the resulting node.
The recursive case handles the three sub cases by comparing i with the sizes. Suppose
function splitAtL(i, L) splits a list of nodes at given position i into three parts: (A, x, B) =
splitAtL(i, L), where x is the i-th node in L, A is a sub list contains all nodes before
position i, and B is a sub list contains all rest nodes after i.
(ϕ, x, ϕ) : T = leaf (x)
(f romL(A), x, tree′ (B, M, R) : i ≤ Sf , (A, x, B) = splitAtL(i, F )
splitAt(i, T ) =
(tree′ (F, Ml , A), x, tree′ (B, Mr , R) : Sf < i ≤ Sf + Sm
(tree′ (F, M, A), x, f romL(B)) : otherwise, (A, x, B) = splitAtL(i − Sf − Sm
(12.45)
Where Ml , x, Mr , A, B in the third case are calculated as the following.
(Ml , t, Mr ) = splitAt(i − Sf , M )
(A, x, B) = splitAtL(i − Sf − size(Ml ), unwraps(t))
And the function splitAtL is just a linear traverse, since the length of list is limited
not to exceed the constraint of 2-3 tree, the performance is still ensured to be constant
O(1) time. Denote L = {x1 , x2 , ...} and L′ = {x2 , x3 , ...}.
(ϕ, x1 , ϕ) : i = 0 ∧ L = {x1 }
splitAtL(i, L) = (ϕ, x1 , L′ ) : i < size′ (x1 ) (12.46)
({x1 } ∪ A, x, B) : otherwise
Where
The solution of splitting is a typical divide and conquer strategy. The performance
of this algorithm is determined by the recursive case of searching in middle part inner
tree. Other cases are all constant time as we’ve analyzed. The depth of recursion is
proportion to the height of the tree h, so the algorithm is bound to O(h). Because the
tree is well balanced (by using 2-3 tree, and all the insertion/removal algorithms keep the
tree balanced), so h = O(lg n) where n is the number of elements stored in finger tree.
The overall performance of splitting is O(lg n).
Let’s first give the Haskell program for splitAtL function
splitNodesAt 0 [x] = ([], x, [])
splitNodesAt i (x:xs) | i < size x = ([], x, xs)
| otherwise = let (xs', y, ys) = splitNodesAt (i-size x) xs
in (x:xs', y, ys)
Then the program for splitAt, as there is already function defined in standard library
with this name, we slightly change the name by adding a apostrophe.
splitAt' _ (Lf x) = (Empty, x, Empty)
splitAt' i (Tr _ f m r)
| i < szf = let (xs, y, ys) = splitNodesAt i f
310 CHAPTER 12. SEQUENCES, THE LAST BRICK
Random access
With the help of splitting at any arbitrary position, it’s trivial to realize random access
in O(lg n) time. Denote function mid(x) returns the 2-nd element of a tuple, lef t(x), and
right(x) return the first element and the 3-rd element of the tuple respectively.
It first splits the sequence at position i, then unwraps the node to get the element
stored inside it. When mutate the i-th element of sequence S represented by finger tree, we
first split it at i, then we replace the middle to what we want to mutate, and re-construct
them to one finger tree by using concatenation.
where
(L, y, R) = splitAt(i, S)
What’s more, we can also realize a removeAt(S, i) function, which can remove the i-th
element from sequence S. The idea is first to split at i, unwrap and return the element
of the i-th node; then concatenate the left and right to a new finger tree.
i ← i − Sf
else
return Lookup-Nodes(Rear(T ), i − Sf − Sm , f )
n ← First-Lf(T )
x ← Elem(n)
Elem(n) ← f (x)
return x
This algorithm is essentially a divide and conquer tree search. It repeatedly examine
the current tree till reach a tree with size of 1 (can it be determined as a leaf? please
consider the ill-formed case and refer to the exercise later). Every time, it checks the
position to be located with the size information of front finger and middle part inner tree.
If the index i is less than the size of front finger, the location is at some node in it. The
algorithm call a sub procedure to look-up among front finger; If the index is between the
size of front finger and the total size till middle part inner tree, it means that the location
is at some node inside the middle, the algorithm goes on traverse along the middle part
inner tree with an updated index reduced by the size of front finger; Otherwise it means
the location is at some node in rear finger, the similar looking up procedure is called
accordingly.
After this loop, we’ve got a node, (can be a compound node) with what we are looking
for at the first leaf inside this node. We can extract the element out, and apply the function
f on it and store the new value back.
The algorithm returns the previous element before applying f as the final result.
What hasn’t been factored is the algorithm Lookup-Nodes(L, i, f ). It takes a list of
nodes, a position index, and a function to be applied. This algorithm can be implemented
by checking every node in the list. If the node is a leaf, and the index is zero, we are at
the right position to be looked up. The function can be applied on the element stored in
this leaf, and the previous value is returned; Otherwise, we need compare the size of this
node and the index to determine if the position is inside this node and search inside the
children of the node if necessary.
function Lookup-Nodes(L, i, f )
loop
for ∀n ∈ L do
if n is leaf ∧i = 0 then
x ← Elem(n)
Elem(n) ← f (x)
return x
if i < Size(n) then
L ← Children(n)
break
i ← i− Size(n)
The following are the corresponding Python code implements the algorithms.
def applyAt(t, i, f):
while [Link] > 1:
szf = sizeNs([Link])
szm = sizeT([Link])
if i < szf:
return lookupNs([Link], i, f)
elif i < szf + szm:
t = [Link]
i = i - szf
else:
return lookupNs([Link], i - szf - szm, f)
n = first_leaf(t)
312 CHAPTER 12. SEQUENCES, THE LAST BRICK
x = elem(n)
[Link][0] = f(x)
return x
With auxiliary algorithm that can apply function at a given position, it’s trivial to
implement the Get-At and Set-At by passing special functions for applying.
function Get-At(T, i)
return Apply-At(T, i, λx .x)
function Set-At(T, i, x)
return Apply-At(T, i, λy .x)
That is we pass id function to implement getting element at a position, which doesn’t
change anything at all; and pass constant function to implement setting, which set the
element to new value by ignoring its previous value.
Imperative splitting
It’s not enough to just realizing Apply-At algorithm in imperative settings, this is be-
cause removing element at arbitrary position is also a typical case.
Almost all the imperative finger tree algorithms so far are kind of one-pass top-down
manner. Although we sometimes need to book keeping the root. It means that we can
even realize all of them without using the parent field.
Splitting operation, however, can be easily implemented by using parent field. We can
first perform a top-down traverse along with the middle part inner tree as long as the
splitting position doesn’t located in front or rear finger. After that, we need a bottom-up
traverse along with the parent field of the two split trees to fill out the necessary fields.
function Split-At(T, i)
T1 ← Tree()
T2 ← Tree()
while Sf ≤ i < Sf + Sm do ▷ Top-down pass
T1′ ← Tree()
T2′ ← Tree()
Front(T1′ ) ← Front(T )
Rear(T2′ ) ← Rear(T )
Connect-Mid(T1 , T1′ )
Connect-Mid(T2 , T2′ )
T1 ← T1′
T2 ← T2′
i ← i − Sf
T ← Mid(T )
if i < Sf then
(X, n, Y ) ← Split-Nodes(Front(T ), i)
12.6. FINGER TREE 313
T1′ ← From-Nodes(X)
T2′ ← T
Size(T2′ ) ← Size(T ) - Size-Nodes(X) - Size(n)
Front(T2′ ) ← Y
else if Sf + Sm ≤ i then
(X, n, Y ) ← Split-Nodes(Rear(T ), i − Sf − Sm )
T2′ ← From-Nodes(Y )
T1′ ← T
Size(T1′ ) ← Size(T ) - Size-Nodes(Y ) - Size(n)
Rear(T1′ ) ← X
Connect-Mid(T1 , T1′ )
Connect-Mid(T2 , T2′ )
i ← i− Size-Tr(T1′ )
while n is NOT leaf do ▷ Bottom-up pass
(X, n, Y ) ← Split-Nodes(Children(n), i)
i ← i− Size-Nodes(X)
Rear(T1 ) ← X
Front(T2 ) ← Y
Size(T1 ) ← Sum-Sizes(T1 )
Size(T2 ) ← Sum-Sizes(T2 )
T1 ← Parent(T1 )
T2 ← Parent(T2 )
return (Flat(T1 ), Elem(n), Flat(T2 ))
The algorithm first creates two trees T1 and T2 to hold the split results. Note that
they are created as ’ground’ trees which are parents of the roots. The first pass is a
top-down pass. Suppose Sf , and Sm retrieve the size of the front finger and the size of
middle part inner tree respectively. If the position at which the tree to be split is located
at middle part inner tree, we reuse the front finger of T for new created T1′ , and reuse
rear finger of T for T2′ . At this time point, we can’t fill the other fields for T1′ and T2′ , they
are left empty, and we’ll finish filling them in the future. After that, we connect T1 and
T1′ so the latter becomes the middle part inner tree of the former. The similar connection
is done for T2 and T2′ as well. Finally, we update the position by deducing it by the size
of front finger, and go on traversing along with the middle part inner tree.
When the first pass finishes, we are at a position that either the splitting should be
performed in front finger, or in rear finger. Splitting the nodes in finger results a tuple,
that the first part and the third part are lists before and after the splitting point, while
the second part is a node contains the element at the original position to be split. As
both fingers hold at most 3 nodes because they are actually 2-3 trees, the nodes splitting
algorithm can be performed by a linear search.
function Split-Nodes(L, i)
for j ∈ [1, Length(L) ] do
if i < Size(L[j]) then
return (L[1...j − 1], L[j], L[j + 1... Length(L) ])
i ← i− Size(L[j])
We next create two new result trees T1′ and T2′ from this tuple, and connected them
as the final middle part inner tree of T1 and T2 .
Next we need perform a bottom-up traverse along with the result trees to fill out all
the empty information we skipped in the first pass.
We loop on the second part of the tuple, the node, till it becomes a leaf. In each
iteration, we repeatedly splitting the children of the node with an updated position i.
314 CHAPTER 12. SEQUENCES, THE LAST BRICK
The first list of nodes returned from splitting is used to fill the rear finger of T1 ; and the
other list of nodes is used to fill the front finger of T2 . After that, since all the three parts
of a finger tree – the front and rear finger, and the middle part inner tree – are filled, we
can then calculate the size of the tree by summing these three parts up.
function Sum-Sizes(T )
return Size-Nodes(Front(T )) + Size-Tr(Mid(T )) + Size-Nodes(Rear(T ))
Next, the iteration goes on along with the parent fields of T1 and T2 . The last ’black-
box’ algorithm is From-Nodes(L), which can create a finger tree from a list of nodes. It
can be easily realized by repeatedly perform insertion on an empty tree. The implemen-
tation is left as an exercise to the reader.
The example Python code for splitting is given as below.
def splitAt(t, i):
(t1, t2) = (Tree(), Tree())
while szf(t) ≤ i and i < szf(t) + szm(t):
fst = Tree(0, [Link], None, [])
snd = Tree(0, [], None, [Link])
t1.set_mid(fst)
t2.set_mid(snd)
(t1, t2) = (fst, snd)
i = i - szf(t)
t = [Link]
if i < szf(t):
(xs, n, ys) = splitNs([Link], i)
sz = [Link] - sizeNs(xs) - [Link]
(fst, snd) = (fromNodes(xs), Tree(sz, ys, [Link], [Link]))
elif szf(t) + szm(t) ≤ i:
(xs, n, ys) = splitNs([Link], i - szf(t) - szm(t))
sz = [Link] - sizeNs(ys) - [Link]
(fst, snd) = (Tree(sz, [Link], [Link], xs), fromNodes(ys))
t1.set_mid(fst)
t2.set_mid(snd)
i = i - sizeT(fst)
while not [Link]:
(xs, n, ys) = splitNs([Link], i)
i = i - sizeNs(xs)
([Link], [Link]) = (xs, ys)
[Link] = sizeNs([Link]) + sizeT([Link]) + sizeNs([Link])
[Link] = sizeNs([Link]) + sizeT([Link]) + sizeNs([Link])
(t1, t2) = ([Link], [Link])
The program to split a list of nodes at a given position is listed like this.
def splitNs(ns, i):
for j in range(len(ns)):
if i < ns[j].size:
return (ns[:j], ns[j], ns[j+1:])
i = i - ns[j].size
Exercise 12.6
12.7. NOTES AND SHORT SUMMARY 315
1. Another way to realize insertT ′ is to force increasing the size field by one, so that
we needn’t write function tree′ . Try to realize the algorithm by using this idea.
2. Try to handle the augment size information as well as in insertT ′ algorithm for
the following algorithms (both functional and imperative): extractT ′ , appendT ′ ,
removeT ′ , and concat′ . The head, tail, init and last functions should be kept
unchanged. Don’t refer to the download-able programs along with this book before
you take a try.
3. In the imperative Apply-At algorithm, it tests if the size of the current tree is
greater than one. Why don’t we test if the current tree is a leaf? Tell the difference
between these two approaches.
4. Implement the From-Nodes(L) in your favorite imperative programming language.
You can either use looping or create a folding-from-right sub algorithm.
[1] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[2] Chris Okasaki. “Purely Functional Random-Access Lists”. Functional Programming
Languages and Computer Architecture, June 1995, pages 86-95.
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[4] Miran Lipovaca. “Learn You a Haskell for Great Good! A Beginner’s Guide”. No
Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327-283-8
[5] Ralf Hinze and Ross Paterson. “Finger Trees: A Simple General-purpose Data
Structure.” in Journal of Functional Programming16:2 (2006), pages 197-217.
[Link] ross/papers/[Link]
[6] Guibas, L. J., McCreight, E. M., Plass, M. F., Roberts, J. R. (1977), ”A new repre-
sentation for linear lists”. Conference Record of the Ninth Annual ACM Symposium
on Theory of Computing, pp. 49¨C60.
[7] Generic finger-tree structure. [Link]
[Link]
[8] Wikipedia. Move-to-front transform. [Link]
front_transform
317
318 Quick sort vs. Merge sort
Chapter 13
13.1 Introduction
It’s proved that the best approximate performance of comparison based sorting is O(n lg n)
[51]. In this chapter, two divide and conquer sorting algorithms are introduced. Both
of them perform in O(n lg n) time. One is quick sort. It is the most popular sorting
algorithm. Quick sort has been well studied, many programming libraries provide sorting
tools based on quick sort.
In this chapter, we’ll first introduce the idea of quick sort, which demonstrates the
power of divide and conquer strategy well. Several variants will be explained, and we’ll
see when quick sort performs poor in some special cases. That the algorithm is not able
to partition the sequence in balance.
In order to solve the unbalanced partition problem, we’ll next introduce about merge
sort, which ensure the sequence to be well partitioned in all the cases. Some variants of
merge sort, including nature merge sort, bottom-up merge sort are shown as well.
Same as other chapters, all the algorithm will be realized in both imperative and
functional approaches.
319
320 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
• If L is empty, the result is obviously empty; This is the trivial edge case;
Note that the emphasized word and, we don’t use ‘then’ here, which indicates it’s quite
OK that the recursive sort on the left and right can be done in parallel. We’ll return this
parallelism topic soon.
Quick sort was first developed by C. A. R. Hoare in 1960 [51], [78]. What we describe
here is a basic version. Note that it doesn’t state how to select the pivot. We’ll see soon
that the pivot selection affects the performance of quick sort dramatically.
13.2. QUICK SORT 321
The most simple method to select the pivot is always choose the first one so that quick
sort can be formalized as the following.
{
ϕ : L=ϕ
sort(L) =
sort({x|x ∈ L′ , x ≤ l1 }) ∪ {l1 } ∪ sort({x|x ∈ L′ , l1 < x}) : otherwise
(13.1)
Where l1 is the first element of the non-empty list L, and L′ contains the rest elements
{l2 , l3 , ...}. Note that we use Zermelo Frankel expression (ZF expression for short)1 , which
is also known as list comprehension. A ZF expression {a|a ∈ S, p1 (a), p2 (a), ...} means
taking all element in set S, if it satisfies all the predication p1 , p2 , .... ZF expression is
originally used for representing set, we extend it to express list for the sake of brevity.
There can be duplicated elements, and different permutations represent for different list.
Please refer to the appendix about list in this book for detail.
It’s quite straightforward to translate this equation to real code if list comprehension
is supported. The following Haskell code is given for example:
sort [] = []
sort (x:xs) = sort [y | y←xs, y ≤ x] ++ [x] ++ sort [y | y←xs, x < y]
This might be the shortest quick sort program in the world at the time when this
book is written. Even a verbose version is still very expressive:
sort [] = []
sort (x:xs) = as ++ [x] ++ bs where
as = sort [ a | a ← xs, a ≤ x]
bs = sort [ b | b ← xs, x < b]
There are some variants of this basic quick sort program, such as using explicit filter-
ing instead of list comprehension. The following Python program demonstrates this for
example:
def sort(xs):
if xs == []:
return []
pivot = xs[0]
as = sort(filter(lambda x : x ≤ pivot, xs[1:]))
bs = sort(filter(lambda x : pivot < x, xs[1:]))
return as + [pivot] + bs
13.2.3 Partition
Observing that the basic version actually takes two passes to find all elements which are
greater than the pivot as well as to find the others which are not respectively. Such
1 Name for the two mathematics who found the modern set theory.
322 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
partition can be accomplished by only one pass. We explicitly define the partition as
below.
(ϕ, ϕ) : L=ϕ
partition(p, L) = ({l1 } ∪ A, B) : p(l1 ), (A, B) = partition(p, L′ ) (13.2)
(A, {l1 } ∪ B) : ¬p(l1 )
Note that the operation {x} ∪ L is just a ‘cons’ operation, which only takes constant
time. The quick sort can be modified accordingly.
{
ϕ : L=ϕ
sort(L) =
sort(A) ∪ {l1 } ∪ sort(B) : otherwise, (A, B) = partition(λx x ≤ l1 , L′ )
(13.3)
Translating this new algorithm into Haskell yields the below code.
sort [] = []
sort (x:xs) = sort as ++ [x] ++ sort bs where
(as, bs) = partition ( ≤ x) xs
The concept of partition is very critical to quick sort. Partition is also very important
to many other sort algorithms. We’ll explain how it generally affects the sorting method-
ology by the end of this chapter. Before further discussion about fine tuning of quick sort
specific partition, let’s see how to realize it in-place imperatively.
There are many partition methods. The one given by Nico Lomuto [4] [4] will be used
here as it’s easy to understand. We’ll show other partition algorithms soon and see how
partitioning affects the performance.
x[l] ...not greater than ... ... greater than ... ...?...x[u]
(b) Start
pivot left right
x[l] ...not greater than ... x[left] ... greater than ...x[u]
swap
(c) Finish
Figure 13.2: Partition a range of array by using the left most element as pivot.
13.2. QUICK SORT 323
Figure 13.2 shows the idea of this one-pass partition method. The array is processed
from left to right. At any time, the array consists of the following parts as shown in figure
13.2 (a):
• The left most cell contains the pivot; By the end of the partition process, the pivot
will be moved to the final proper position;
• A segment contains all elements which are not greater than the pivot. The right
boundary of this segment is marked as ‘left’;
• A segment contains all elements which are greater than the pivot. The right bound-
ary of this segment is marked as ‘right’; It means that elements between ‘left’ and
‘right’ marks are greater than the pivot;
• The rest of elements after ‘right’ mark haven’t been processed yet. They may be
greater than the pivot or not.
At the beginning of partition, the ‘left’ mark points to the pivot and the ‘right’ mark
points to the the second element next to the pivot in the array as in Figure 13.2 (b); Then
the algorithm repeatedly advances the right mark one element after the other till passes
the end of the array.
In every iteration, the element pointed by the ‘right’ mark is compared with the pivot.
If it is greater than the pivot, it should be among the segment between the ‘left’ and ‘right’
marks, so that the algorithm goes on to advance the ‘right’ mark and examine the next
element; Otherwise, since the element pointed by ‘right’ mark is less than or equal to the
pivot (not greater than), it should be put before the ‘left’ mark. In order to achieve this,
the ‘left’ mark needs be advanced by one, then exchange the elements pointed by the ‘left’
and ‘right’ marks.
Once the ‘right’ mark passes the last element, it means that all the elements have
been processed. The elements which are greater than the pivot have been moved to the
right hand of ‘left’ mark while the others are to the left hand of this mark. Note that the
pivot should move between the two segments. An extra exchanging between the pivot
and the element pointed by ‘left’ mark makes this final one to the correct location. This
is shown by the swap bi-directional arrow in figure 13.2 (c).
The ‘left’ mark (which points the pivot finally) partitions the whole array into two
parts, it is returned as the result. We typically increase the ‘left’ mark by one, so that it
points to the first element greater than the pivot for convenient. Note that the array is
modified in-place.
The partition algorithm can be described as the following. It takes three arguments,
the array A, the lower and the upper bound to be partitioned 2 .
1: function Partition(A, l, u)
2: p ← A[l] ▷ the pivot
3: L←l ▷ the left mark
4: for R ∈ [l + 1, u] do ▷ iterate on the right mark
5: if ¬(p < A[R]) then ▷ negate of < is enough for strict weak order
6: L←L+1
7: Exchange A[L] ↔ A[R]
8: Exchange A[L] ↔ p
9: return L + 1 ▷ The partition position
2 The partition algorithm used here is slightly different from the one in [4]. The latter uses the last
Below table shows the steps of partitioning the array {3, 2, 5, 4, 0, 1, 6, 7}.
(l) 3 (r) 2 5 4 0 1 6 7 initialize, pivot = 3, l = 1, r = 2
3 (l)(r) 2 5 4 0 1 6 7 2 < 3, advance l, (r = l)
3 (l) 2 (r) 5 4 0 1 6 7 5 > 3, move on
3 (l) 2 5 (r) 4 0 1 6 7 4 > 3, move on
3 (l) 2 5 4 (r) 0 1 6 7 0<3
3 2 (l) 0 4 (r) 5 1 6 7 Advance l, then swap with r
3 2 (l) 0 4 5 (r) 1 6 7 1<3
3 2 0 (l) 1 5 (r) 4 6 7 Advance l, then swap with r
3 2 0 (l) 1 5 4 (r) 6 7 6 > 3, move on
3 2 0 (l) 1 5 4 6 (r) 7 7 > 3, move on
1 2 0 3 (l+1) 5 4 6 7 r passes the end, swap pivot and
This version of partition algorithm can be implemented in ANSI C as the following.
int partition(Key∗ xs, int l, int u) {
int pivot, r;
for (pivot = l, r = l + 1; r < u; ++r)
if (!(xs[pivot] < xs[r])) {
++l;
swap(xs[l], xs[r]);
}
swap(xs[pivot], xs[l]);
return l + 1;
}
Where function f compares the element to the pivot with predicate p (which is passed
to f as a parameter, so that f is in curried form, see appendix A for detail. Alternatively,
f can be a lexical closure which is in the scope of partition, so that it can access the
predicate in this scope.), and update the result pair accordingly.
{
({x} ∪ A, B) : p(x)
f (p, x, (A, B)) = (13.5)
(A, {x} ∪ B) : otherwise(¬p(x))
Accumulated partition
The partition algorithm by using folding actually accumulates to the result pair of lists
(A, B). That if the element is not greater than the pivot, it’s accumulated to A, otherwise
to B. We can explicitly express it which save spaces and is friendly for tail-recursive call
optimization (refer to the appendix A of this book for detail).
(A, B) : L = ϕ
partition(p, L, A, B) = partition(p, L′ , {l1 } ∪ A, B) : p(l1 ) (13.6)
partition(p, L′ , A, {l1 } ∪ B) : otherwise
Where l1 is the first element in L if L isn’t empty, and L′ contains the rest elements
except for l1 , that L′ = {l2 , l3 , ...} for example. The quick sort algorithm then uses this
accumulated partition function by passing the λx x ≤ pivot as the partition predicate.
{
ϕ : L=ϕ
sort(L) = (13.7)
sort(A) ∪ {l1 } ∪ sort(B) : otherwise
(A, B) = partition(λx x ≤ l1 , L′ , ϕ, ϕ)
partition finishes, the two sub lists need to be recursively sorted. We can first recursively
sort the list contains the elements which are greater than the pivot, then link the pivot
in front of it and use it as an accumulator for next step sorting.
Based on this idea, the ’...’ part in above definition can be realized as the following.
{
′ S : L=ϕ
sort (L, S) =
sort(A, {l1 } ∪ sort(B, ?)) : otherwise
The problem is what’s the accumulator when sorting B. There is an important invari-
ant actually, that at every time, the accumulator S holds the elements have been sorted
so far. So that we should sort B by accumulating to S.
{
S : L=ϕ
sort′ (L, S) = (13.8)
sort(A, {l1 } ∪ sort(B, S)) : otherwise
The following Haskell example program implements the accumulated quick sort algo-
rithm.
asort xs = asort' xs []
Exercise 13.1
• Implement the recursive basic quick sort algorithm in your favorite imperative pro-
gramming language.
• Same as the imperative algorithm, one minor improvement is that besides the empty
case, we needn’t sort the singleton list, implement this idea in the functional algo-
rithm as well.
• The accumulated quick sort algorithm developed in this section uses intermediate
variable A, B. They can be eliminated by defining the partition function to mutually
recursive call the sort function. Implement this idea in your favorite functional
programming language. Please don’t refer to the downloadable example program
along with this book before you try it.
n/2 n/2
n /4 n /4 n /4 n /4
...lg(n)...
1 1 ...n... 1
Figure 13.3: In the best case, quick sort divides the sequence into two slices with same
length.
The total time in the third level is also bound to O(n); ... In the last level, there are n
small slices each contains a single element, the time is bound to O(n). Summing all the
time in each level gives the total performance of quick sort in best case as O(n lg n).
However, in the worst case, the partition process unluckily divides the sequence to
two slices with unbalanced lengths in most time. That one slices with length O(1), the
other is O(n). Thus the recursive time degrades to O(n). If we draw a similar figure,
unlike in the best case, which forms a balanced binary tree, the worst case degrades into
a very unbalanced tree that every node has only one child, while the other is empty. The
binary tree turns to be a linked list with O(n) length. And in every level, all the elements
are processed, so the total performance in worst case is O(n2 ), which is as same poor as
insertion sort and selection sort.
Let’s consider when the worst case will happen. One special case is that all the
elements (or most of the elements) are same. Nico Lomuto’s partition method deals with
such sequence poor. We’ll see how to solve this problem by introducing other partition
algorithm in the next section.
The other two obvious cases which lead to worst case happen when the sequence
has already in ascending or descending order. Partition the ascending sequence makes an
empty sub list before the pivot, while the list after the pivot contains all the rest elements.
Partition the descending sequence gives an opponent result.
There are other cases which lead quick sort performs poor. There is no completely
satisfied solution which can avoid the worst case. We’ll see some engineering practice in
next section which can make it very seldom to meet the worst case.
fact that the performance is proportion to the total comparing operations during quick
sort [4]. Different with the selections sort that every two elements have been compared.
Quick sort avoid many unnecessary comparisons. For example suppose a partition oper-
ation on list {a1 , a2 , a3 , ..., an }. Select a1 as the pivot, the partition builds two sub lists
A = {x1 , x2 , ..., xk } and B = {y1 , y2 , ..., yn−k−1 }. In the rest time of quick sort, The
element in A will never be compared with any elements in B.
Denote the final sorted result as {a1 , a2 , ..., an }, this indicates that if element ai < aj ,
they will not be compared any longer if and only if some element ak where ai < ak < aj
has ever been selected as pivot before ai or aj being selected as the pivot.
That is to say, the only chance that ai and aj being compared is either ai is chosen as
pivot or aj is chosen as pivot before any other elements in ordered range ai+1 < ai+2 <
... < aj−1 are selected.
Let P (i, j) represent the probability that ai and aj being compared. We have:
2
P (i, j) = (13.9)
j−i+1
∑
n−1 ∑
n
C(n) = P (i, j) (13.10)
i=1 j=i+1
Note the fact that if we compared ai and aj , we won’t compare aj and ai again in the
quick sort algorithm, and we never compare ai onto itself. That’s why we set the upper
bound of i to n − 1; and lower bound of j to i + 1.
Substitute the probability, it yields:
∑
n−1 ∑
n
2
C(n) =
i=1 j=i+1
j−i+1
(13.11)
∑∑
n−1 n−i
2
=
i=1 k=1
k+1
1 1
Hn = 1 + + + .... = ln n + γ + ϵn
2 3
∑
n−1
C(n) = O(lg n) = O(n lg n) (13.12)
i=1
The other method to prove the average performance is to use the recursive fact that
when sorting list of length n, the partition splits the list into two sub lists with length i
and n − i − 1. The partition process itself takes cn time because it examine every element
with the pivot. So we have the following equation.
Where T (n) is the total time when perform quick sort on list of length n. Since i is
13.3. PERFORMANCE ANALYSIS FOR QUICK SORT 329
equally like to be any of 0, 1, ..., n − 1, taking math expectation to the equation gives:
Subtract equation (13.15) and (13.16) can eliminate all the T (i) for 0 ≤ i < n − 1.
nT (n) = (n + 1)T (n − 1) + 2cn − c (13.17)
As we can drop the constant time c in computing performance. The equation can be
one more step changed like below.
T (n) T (n − 1) 2c
= + (13.18)
n+1 n n+1
Next we assign n to n − 1, n − 2, ..., which gives us n − 1 equations.
T (n − 1) T (n − 2) 2c
= +
n n−1 n
T (n − 2) T (n − 3) 2c
= +
n−1 n−2 n−1
...
T (2) T (1) 2c
= +
3 2 3
Sum all them up, and eliminate the same components in both sides, we can deduce to
a function of n.
T (n) T (1) ∑1
n+1
= + 2c (13.19)
n+1 2 k
k=3
Using the harmonic series mentioned above, the final result is:
T (n) T (1)
O( ) = O( + 2c ln n + γ + ϵn ) = O(lg n) (13.20)
n+1 2
Thus
O(T (n)) = O(n lg n) (13.21)
Exercise 13.2
• Why Lomuto’s methods performs poor when there are many duplicated elements?
330 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
1. The normal basic quick sort: That we select an arbitrary element, which is x as the
pivot, partition it to two sub sequences, one is {x, x, ..., x}, which contains n − 1
elements, the other is empty. then recursively sort the first one; this is obviously
quadratic O(n2 ) solution.
2. The other way is to only pick those elements strictly smaller than x, and strictly
greater than x. Such partition results two empty sub sequences, and n elements
equal to the pivot. Next we recursively sort the sub sequences contains the smaller
and the bigger elements, since both of them are empty, the recursive call returns
immediately; The only thing left is to concatenate the sort results in front of and
after the list of elements which are equal to the pivot.
The latter one performs in O(n) time if all elements are equal. This indicates an
important improvement for partition. That instead of binary partition (split to two sub
lists and a pivot), ternary partition (split to three sub lists) handles duplicated elements
better.
We can define the ternary quick sort as the following.
{
ϕ : L=ϕ
sort(L) = (13.22)
sort(S) ∪ sort(E) ∪ sort(G) : otherwise
Where S, E, G are sub lists contains all elements which are less than, equal to, and
greater than the pivot respectively.
S = {x|x ∈ L, x < l1 }
E = {x|x ∈ L, x = l1 }
G = {x|x ∈ L, l1 < x}
The basic ternary quick sort can be implemented in Haskell as the following example
code.
sort [] = []
sort (x:xs) = sort [a | a←xs, a<x] ++
x:[b | b←xs, b==x] ++ sort [c | c←xs, c>x]
Note that the comparison between elements must support abstract ‘less-than’ and
‘equal-to’ operations. The basic version of ternary sort takes linear O(n) time to con-
catenate the three sub lists. It can be improved by using the standard techniques of
accumulator.
13.4. ENGINEERING IMPROVEMENT 331
Suppose function sort′ (L, A) is the accumulated ternary quick sort definition, that
L is the sequence to be sorted, and the accumulator A contains the intermediate sorted
result so far. We initialize the sorting with an empty accumulator: sort(L) = sort′ (L, ϕ).
It’s easy to give the trivial edge cases like below.
{
′ A : L=ϕ
sort (L, A) =
... : otherwise
For the recursive case, as the ternary partition splits to three sub lists S, E, G, only S
and G need recursive sort, E contains all elements equal to the pivot, which is in correct
order thus needn’t to be sorted any more. The idea is to sort G with accumulator A, then
concatenate it behind E, then use this result as the new accumulator, and start to sort
S:
{
A : L=ϕ
sort′ (L, A) = (13.23)
sort(S, E ∪ sort(G, A)) : otherwise
The partition can also be realized with accumulators. It is similar to what has been
developed for the basic version of quick sort. Note that we can’t just pass only one
predication for pivot comparison. It actually needs two, one for less-than, the other for
equality testing. For the sake of brevity, we pass the pivot element instead.
(S, E, G) : L = ϕ
partition(p, L′ , {l1 } ∪ S, E, G) : l1 < p
partition(p, L, S, E, G) = (13.24)
partition(p, L′ , S, {l1 } ∪ E, G) : l1 = p
partition(p, L′ , S, E, {l1 } ∪ G) : p < l1
Where l1 is the first element in L if L isn’t empty, and L′ contains all rest elements
except for l1 . Below Haskell program implements this algorithm. It starts the recursive
sorting immediately in the edge case of parition.
sort xs = sort' xs []
sort' [] r = r
sort' (x:xs) r = part xs [] [x] [] r where
part [] as bs cs r = sort' as (bs ++ sort' cs r)
part (x':xs') as bs cs r | x' < x = part xs' (x':as) bs cs r
| x' == x = part xs' as (x':bs) cs r
| x' > x = part xs' as bs (x':cs) r
Richard Bird developed another version in [1], that instead of concatenating the re-
cursively sorted results, it uses a list of sorted sub lists, and performs concatenation
finally.
sort xs = concat $ pass xs []
2-way partition
The cases with many duplicated elements can also be handled imperatively. Robert
Sedgewick presented a partition method [69], [4] which holds two pointers. One moves
332 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
from left to right, the other moves from right to left. The two pointers are initialized as
the left and right boundaries of the array.
When start partition, the left most element is selected as the pivot. Then the left
pointer i keeps advancing to right until it meets any element which is not less than the
pivot; On the other hand3 , The right pointer j repeatedly scans to left until it meets any
element which is not greater than the pivot.
At this time, all elements before the left pointer i are strictly less than the pivot, while
all elements after the right pointer j are greater than the pivot. i points to an element
which is either greater than or equal to the pivot; while j points to an element which is
either less than or equal to the pivot, the situation at this stage is illustrated in figure
13.4 (a).
In order to partition all elements less than or equal to the pivot to the left, and the
others to the right, we can exchange the two elements pointed by i, and j. After that the
scan can be resumed. We repeat this process until either i meets j, or they overlap.
At any time point during partition. There is invariant that all elements before i
(including the one pointed by i) are not greater than the pivot; while all elements after j
(including the one pointed by j) are not less than the pivot. The elements between i and
j haven’t been examined yet. This invariant is shown in figure 13.4 (b).
x[l] ... less than ... x[i] ...?... x[j] ... greater than ...
x[l] ... not greater than ... ...?... ... not less than ...
Figure 13.4: Partition a range of array by using the left most element as the pivot.
After the left pointer i meets the right pointer j, or they overlap each other, we need
one extra exchanging to move the pivot located at the first position to the correct place
which is pointed by j. Next, the elements between the lower bound and j as well as the
sub slice between i and the upper bound of the array are recursively sorted.
This algorithm can be described as the following.
1: procedure Sort(A, l, u) ▷ sort range [l, u)
2: if u − l > 1 then ▷ More than 1 element for non-trivial case
3: i ← l, j ← u
4: pivot ← A[l]
5: loop
6: repeat
7: i←i+1
8: until A[i] ≥ pivot ▷ Need handle error case that i ≥ u in fact.
9: repeat
10: j ←j−1
11: until A[j] ≤ pivot ▷ Need handle error case that j < l in fact.
3 We don’t use ‘then’ because it’s quite OK to perform the two scans in parallel.
13.4. ENGINEERING IMPROVEMENT 333
Comparing this algorithm with the basic version based on N. Lumoto’s partition
method, we can find that it swaps fewer elements, because it skips those have already in
proper sides of the pivot.
3-way partition
It’s obviously that, we should avoid those unnecessary swapping for the duplicated ele-
ments. What’s more, the algorithm can be developed with the idea of ternary sort (as
known as 3-way partition in some materials), that all the elements which are strictly less
than the pivot are put to the left sub slice, while those are greater than the pivot are put
to the right. The middle part holds all the elements which are equal to the pivot. With
such ternary partition, we need only recursively sort the ones which differ from the pivot.
Thus in the above extreme case, there aren’t any elements need further sorting. So the
overall performance is linear O(n).
The difficulty is how to do the 3-way partition. Jon Bentley and Douglas McIlroy
developed a solution which keeps those elements equal to the pivot at the left most and
right most sides as shown in figure 13.5 (a) [70] [71].
The majority part of scan process is as same as the one developed by Robert Sedgewick,
that i and j keep advancing toward each other until they meet any element which is greater
then or equal to the pivot for i, or less than or equal to the pivot for j respectively. At
this time, if i and j don’t meet each other or overlap, they are not only exchanged, but
also examined if the elements pointed by them are identical to the pivot. Then necessary
exchanging happens between i and p, as well as j and q.
By the end of the partition process, the elements equal to the pivot need to be swapped
to the middle part from the left and right ends. The number of such extra exchanging
operations are proportion to the number of duplicated elements. It’s zero operation if
elements are unique which there is no overhead in the case. The final partition result
is shown in figure 13.5 (b). After that we only need recursively sort the ‘less-than’ and
‘greater-than’ sub slices.
334 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
pivot p i j q
x[l] ... equal ... ... less than... ...?... ... greater than ... ... equal ...
... less than... ... equal ... ... greater than ...
It can be seen that the the algorithm turns to be a bit complex when it evolves to 3-way
partition. There are some tricky edge cases should be handled with caution. Actually, we
just need a ternary partition algorithm. This remind us the N. Lumoto’s method, which
is straightforward enough to be a start point.
The idea is to change the invariant a bit. We still select the first element as the pivot,
as shown in figure 13.6, at any time, the left most section contains elements which are
strictly less than the pivot; the next section contains the elements equal to the pivot; the
right most section holds all the elements which are strictly greater than the pivot. The
boundaries of three sections are marked as i, k, and j respectively. The rest part, which
is between k and j are elements haven’t been scanned yet.
At the beginning of this algorithm, the ‘less-than’ section is empty; the ‘equal-to’
section contains only one element, which is the pivot; so that i is initialized to the lower
bound of the array, and k points to the element next to i. The ‘greater-than’ section is
also initialized as empty, thus j is set to the upper bound.
i k j
... less than... ... equal ... ...?... ... greater than ...
When the partition process starts, the element pointed by k is examined. If it’s equal
to the pivot, k just advances to the next one; If it’s greater than the pivot, we swap it with
the last element in the unknown area, so that the length of ‘greater-than’ section increases
by one. It’s boundary j moves to the left. Since we don’t know if the elements swapped
to k is still greater than the pivot, it should be examined again repeatedly. Otherwise, if
the element is less than the pivot, we can exchange it with the first one in the ‘equal-to’
section to resume the invariant. The partition algorithm stops when k meets j.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: i ← l, j ← u, k ← l + 1
4: pivot ← A[i]
5: while k < j do
6: while pivot < A[k] do
336 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
7: j ←j−1
8: Exchange A[k] ↔ A[j]
9: if A[k] < pivot then
10: Exchange A[k] ↔ A[i]
11: i←i+1
12: k ←k+1
13: Sort(A, l, i)
14: Sort(A, j, u)
Compare this one with the previous 3-way partition quick sort algorithm, it’s more
simple at the cost of more swapping operations. Below ANSI C program implements this
algorithm.
void qsort(Key∗ xs, int l, int u) {
int i, j, k; Key pivot;
if (l < u - 1) {
i = l; j = u; pivot = xs[l];
for (k = l + 1; k < j; ++k) {
while (pivot < xs[k]) { --j; swap(xs[j], xs[k]); }
if (xs[k] < pivot) { swap(xs[i], xs[k]); ++i; }
}
qsort(xs, l, i);
qsort(xs, j, u);
}
}
Exercise 13.3
• All the quick sort imperative algorithms given in this section use the first element
as the pivot, another method is to choose the last one as the pivot. Realize the
quick sort algorithms, including the basic version, Sedgewick version, and ternary
(3-way partition) version by using this approach.
[] ...
[] x[n]
(a) The partition tree for {x1 < x2 < ... < xn }, There aren’t any elements less than or equal to
the pivot (the first element) in every partition.
y[1] y[2] ... y[n]
... []
y[n] []
(b) The partition tree for {y1 > y2 > ... > yn }, There aren’t any
elements greater than or equal to the pivot (the first element) in every
partition.
... [] [] ...
x[1] [] [] x[n]
(a) Except for the first partition, all the others are unbal-
anced.
x[n] x[1] x[n-1] x[2] ...
[] ...
Unfortunately, none of the above 4 worst cases can be well handled by this program,
this is because the sampling is not good. We need telescope, but not microscope to
profile the whole list to be partitioned. We’ll see the functional way to solve the partition
problem later.
Except for the median-of-three, there is another popular engineering practice to get
good partition result. instead of always taking the first element or the last one as the pivot.
One alternative is to randomly select one. For example as the following modification.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: Exchange A[l] ↔ A[ Random(l, u) ]
4: (i, j) ← Partition(A, l, u)
5: Sort(A, l, i)
6: Sort(A, j, u)
The function Random(l, u) returns a random integer i between l and u, that l ≤ i < u.
The element at this position is exchanged with the first one, so that it is selected as the
pivot for the further partition. This algorithm is called random quick sort [4].
Theoretically, neither median-of-three nor random quick sort can avoid the worst case
completely. If the sequence to be sorted is randomly distributed, no matter choosing the
340 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
first one as the pivot, or the any other arbitrary one are equally in effect. Considering the
underlying data structure of the sequence is singly linked-list in functional setting, it’s
expensive to strictly apply the idea of random quick sort in purely functional approach.
Even with this bad news, the engineering improvement still makes sense in real world
programming.
Exercise 13.4
• Can you figure out more quick sort worst cases besides the four given in this section?
• Implement median-of-three method in your favorite imperative programming lan-
guage.
• Implement random quick sort in your favorite imperative programming language.
• Implement the algorithm which falls back to insertion sort when the length of list
is small in both imperative and functional approach.
Where
Tl = unf old({a|a ∈ L′ , a ≤ l1 })
(13.26)
Tr = unf old({a|a ∈ L′ , l1 < a})
13.8. MERGE SORT 341
The interesting point is that, this algorithm creates tree in a different way as we
introduced in the chapter of binary search tree. If the list to be unfold is empty, the
result is obviously an empty tree. This is the trivial edge case; Otherwise, the algorithm
set the first element l1 in the list as the key of the node, and recursively creates its left
and right children. Where the elements used to form the left child are those which are
less than or equal to the key in L′ , while the rest elements which are greater than the key
are used to form the right child.
Remind the algorithm which turns a binary search tree to a list by in-order traversing:
{
ϕ : T =ϕ
toList(T ) = (13.27)
toList(lef t(T )) ∪ {key(T )} ∪ toList(right(T )) : otherwise
We can define quick sort algorithm by composing these two functions.
quickSort = toList · unf old (13.28)
The binary search tree built in the first step of applying unf old is the intermediate
result. This result is consumed by toList and dropped after the second step. It’s quite
possible to eliminate this intermediate result, which leads to the basic version of quick
sort.
The elimination of the intermediate binary search tree is called deforestation. This
concept is based on Burstle-Darlington’s work [9].
Merge
There are two ‘black-boxes’ in the above merge sort definition, one is the splitAt function,
which splits a list at a given position; the other is the merge function, which can merge
two sorted lists into one.
As presented in the appendix of this book, it’s trivial to realize splitAt in imperative
settings by using random access. However, in functional settings, it’s typically realized
as a linear algorithm:
{
(ϕ, L) : n = 0
splitAt(n, L) = (13.30)
({l1 } ∪ A, B) : otherwise, (A, B) = splitAt(n − 1, L′ )
Where l1 is the first element of L, and L′ represents the rest elements except of l1 if
L isn’t empty.
The idea of merge can be illustrated as in figure 13.9. Consider two lines of kids. The
kids have already stood in order of their heights. that the shortest one stands at the first,
then a taller one, the tallest one stands at the end of the line.
Now let’s ask the kids to pass a door one by one, every time there can be at most one
kid pass the door. The kids must pass this door in the order of their height. The one
can’t pass the door before all the kids who are shorter than him/her.
Since the two lines of kids have already been ‘sorted’, the solution is to ask the first
two kids, one from each line, compare their height, and let the shorter kid pass the door;
Then they repeat this step until one line is empty, after that, all the rest kids can pass
the door one by one.
This idea can be formalized in the following equation.
A : B=ϕ
B : A=ϕ
merge(A, B) = (13.31)
{a1 } ∪ merge(A′ , B) : a1 ≤ b1
{b1 } ∪ merge(A, B ′ ) : otherwise
Where a1 and b1 are the first elements in list A and B; A′ and B ′ are the rest elements
except for the first ones respectively. The first two cases are trivial edge cases. That merge
13.8. MERGE SORT 343
one sorted list with an empty list results the same sorted list; Otherwise, if both lists are
non-empty, we take the first elements from the two lists, compare them, and use the
minimum as the first one of the result, then recursively merge the rest.
With merge defined, the basic version of merge sort can be implemented like the
following Haskell example code.
msort [] = []
msort [x] = [x]
msort xs = merge (msort as) (msort bs) where
(as, bs) = splitAt (length xs `div` 2) xs
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) | x ≤ y = x : merge xs (y:ys)
| x > y = y : merge (x:xs) ys
Note that, the implementation differs from the algorithm definition that it treats the
singleton list as trivial edge case as well.
Merge sort can also be realized imperatively. The basic version can be developed as
the below algorithm.
1: procedure Sort(A)
2: if |A| > 1 then
3: m ← b |A|
2 c
4: X ← Copy-Array(A[1...m])
5: Y ← Copy-Array(A[m + 1...|A|])
6: Sort(X)
7: Sort(Y )
8: Merge(A, X, Y )
When the array to be sorted contains at least two elements, the non-trivial sorting
process starts. It first copy the first half to a new created array X, and the second half
to a second new array Y . Recursively sort them; and finally merge the sorted result back
to A.
This version uses the same amount of extra spaces of A. This is because the Merge
algorithm isn’t in-place at the moment. We’ll introduce the imperative in-place merge
sort in later section.
The merge process almost does the same thing as the functional definition. There is
a verbose version and a simplified version by using sentinel.
The verbose merge algorithm continuously checks the element from the two input
arrays, picks the smaller one and puts it back to the result array A, it then advances
along the arrays respectively until either one input array is exhausted. After that, the
algorithm appends the rest of the elements in the other input array to A.
1: procedure Merge(A, X, Y )
2: i ← 1, j ← 1, k ← 1
3: m ← |X|, n ← |Y |
4: while i ≤ m ∧ j ≤ n do
5: if X[i] < Y [j] then
6: A[k] ← X[i]
7: i←i+1
8: else
9: A[k] ← Y [j]
10: j ←j+1
11: k ←k+1
12: while i ≤ m do
13: A[k] ← X[i]
344 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
14: k ←k+1
15: i←i+1
16: while j ≤ n do
17: A[k] ← Y [j]
18: k ←k+1
19: j ←j+1
Although this algorithm is a bit verbose, it can be short in some programming en-
vironment with enough tools to manipulate array. The following Python program is an
example.
def msort(xs):
n = len(xs)
if n > 1:
ys = [x for x in xs[:n/2]]
zs = [x for x in xs[n/2:]]
ys = msort(ys)
zs = msort(zs)
xs = merge(xs, ys, zs)
return xs
Performance
Before dive into the improvement of this basic version, let’s analyze the performance of
merge sort. The algorithm contains two steps, divide step, and merge step. In divide step,
the sequence to be sorted is always divided into two sub sequences with the same length.
If we draw a similar partition tree as what we did for quick sort, it can be found this tree
is a perfectly balanced binary tree as shown in figure 13.3. Thus the height of this tree is
O(lg n). It means the recursion depth of merge sort is bound to O(lg n). Merge happens
in every level. It’s intuitive to analyze the merge algorithm, that it compare elements
from two input sequences in pairs, after one sequence is fully examined the rest one is
copied one by one to the result, thus it’s a linear algorithm proportion to the length of
the sequence. Based on this facts, denote T (n) the time for sorting the sequence with
length n, we can write the recursive time cost as below.
n n
T (n) = T ( ) + T ( ) + cn
2 2 (13.32)
n
= 2T ( ) + cn
2
It states that the cost consists of three parts: merge sort the first half takes T ( n2 ),
merge sort the second half takes also T ( n2 ), merge the two results takes cn, where c is
some constant. Solve this equation gives the result as O(n lg n).
Note that, this performance doesn’t vary in all cases, as merge sort always uniformly
divides the input.
Another significant performance indicator is space occupation. However, it varies a
lot in different merge sort implementation. The detail space bounds analysis will be
explained in every detailed variants later.
13.8. MERGE SORT 345
For the basic imperative merge sort, observe that it demands same amount of spaces
as the input array in every recursion, copies the original elements to them for recursive
sort, and these spaces can be released after this level of recursion. So the peak space
requirement happens when the recursion enters to the deepest level, which is O(n lg n).
The functional merge sort consume much less than this amount, because the underly-
ing data structure of the sequence is linked-list. Thus it needn’t extra spaces for merge4 .
The only spaces requirement is for book-keeping the stack for recursive calls. This can
be seen in the later explanation of even-odd split algorithm.
Minor improvement
We’ll next improve the basic merge sort bit by bit for both the functional and imperative
realizations. The first observation is that the imperative merge algorithm is a bit verbose.
[4] presents an elegant simplification by using positive ∞ as the sentinel. That we append
∞ as the last element to the both ordered arrays for merging5 . Thus we needn’t test
which array is not exhausted. Figure 13.10 illustrates this idea.
1: procedure Merge(A, X, Y )
2: Append(X, ∞)
3: Append(Y, ∞)
4: i ← 1, j ← 1
5: for k ← from 1 to |A| do
6: if X[i] < Y [j] then
7: A[k] ← X[i]
8: i←i+1
9: else
10: A[k] ← Y [j]
11: j ←j+1
The following ANSI C program imlements this idea. It embeds the merge inside. INF
is defined as a big constant number with the same type of Key. Where the type can either
be defined elsewhere or we can abstract the type information by passing the comparator
as parameter. We skip these implementation and language details here.
void msort(Key∗ xs, int l, int u) {
int i, j, m;
4 The complex effects caused by lazy evaluation is ignored here, please refer to [72] for detail
5 For sorting in monotonic non-increasing order, −∞ can be used instead
346 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
Running this program takes much more time than the quick sort. Besides the major
reason we’ll explain later, one problem is that this version frequently allocates and releases
memories for merging. While memory allocation is one of the well known bottle-neck in
real world as mentioned by Bentley in [4]. One solution to address this issue is to allocate
another array with the same size to the original one as the working area. The recursive
sort for the first and second halves needn’t allocate any more extra spaces, but use the
working area when merging. Finally, the algorithm copies the merged result back.
This idea can be expressed as the following modified algorithm.
1: procedure Sort(A)
2: B ← Create-Array(|A|)
3: Sort’(A, B, 1, |A|)
4: procedure Sort’(A, B, l, u)
5: if u − l > 0 then
6: 2 c
m ← b l+u
7: Sort’(A, B, l, m)
8: Sort’(A, B, m + 1, u)
9: Merge’(A, B, l, m, u)
This algorithm duplicates another array, and pass it along with the original array
to be sorted to Sort’ algorithm. In real implementation, this working area should be
released either manually, or by some automatic tool such as GC (Garbage collection).
The modified algorithm Merge’ also accepts a working area as parameter.
1: procedure Merge’(A, B, l, m, u)
2: i ← l, j ← m + 1, k ← l
3: while i ≤ m ∧ j ≤ u do
4: if A[i] < A[j] then
5: B[k] ← A[i]
6: i←i+1
7: else
8: B[k] ← A[j]
9: j ←j+1
10: k ←k+1
11: while i ≤ m do
12: B[k] ← A[i]
13: k ←k+1
14: i←i+1
15: while j ≤ u do
13.8. MERGE SORT 347
This new version runs faster than the previous one. In my test machine, it speeds up
about 20% to 25% when sorting 100,000 randomly generated numbers.
The basic functional merge sort can also be fine tuned. Observe that, it splits the list
at the middle point. However, as the underlying data structure to represent list is singly
linked-list, random access at a given position is a linear operation (refer to appendix A for
detail). Alternatively, one can split the list in an even-odd manner. That all the elements
in even position are collected in one sub list, while all the odd elements are collected
in another. As for any list, there are either same amount of elements in even and odd
positions, or they differ by one. So this divide strategy always leads to well splitting, thus
the performance can be ensured to be O(n lg n) in all cases.
The even-odd splitting algorithm can be defined as below.
(ϕ, ϕ) : L = ϕ
split(L) = ({l1 }, ϕ) : |L| = 1 (13.33)
({l1 } ∪ A, {l2 } ∪ B) : otherwise, (A, B) = split(L′′ )
When the list is empty, the split result are two empty lists; If there is only one element
in the list, we put this single element, which is at position 1, to the odd sub list, the even
sub list is empty; Otherwise, it means there are at least two elements in the list, We pick
348 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
the first one to the odd sub list, the second one to the even sub list, and recursively split
the rest elements.
All the other functions are kept same, the modified Haskell program is given as the
following.
split [] = ([], [])
split [x] = ([x], [])
split (x:y:xs) = (x:xs', y:ys') where (xs', ys') = split xs
merged xs[i] ...sorted sub list A... xs[j] ...sorted sub list B...
1: procedure Merge(A, l, m, u)
2: while l ≤ m ∧ m ≤ u do
3: if A[l] < A[m] then
4: l ←l+1
5: else
6: x ← A[m]
7: for i ← m down-to l + 1 do ▷ Shift
8: A[i] ← A[i − 1]
9: A[l] ← x
However, this naive solution downgrades merge sort overall performance to quadratic
O(n2 )! This is because that array shifting is a linear operation. It is proportion to the
length of elements in the first sorted sub array which haven’t been compared so far.
The following ANSI C program based on this algorithm runs very slow, that it takes
about 12 times slower than the previous version when sorting 10,000 random numbers.
13.9. IN-PLACE MERGE SORT 349
compare
... reuse ... A[i] ... ... reuse ... B[j] ...
In our algorithm, both the two sorted sub arrays, and the working area for merging
are parts of the original array to be sorted. we need supply the following arguments
when merging: the start points and end points of the sorted sub arrays, which can be
represented as ranges; and the start point of the working area. The following algorithm
for example, uses [a, b) to indicate the range include a, exclude b. It merges sorted range
[i, m) and range [j, n) to the working area starts from k.
1: procedure Merge(A, [i, m), [j, n), k)
350 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
1. The working area should be within the bounds of the array. In other words, it
should be big enough to hold elements exchanged in without causing any out-of-
bound error;
2. The working area can be overlapped with either of the two sorted arrays, however,
it should be ensured that there are not any unmerged elements being overwritten;
With this merging algorithm defined, it’s easy to imagine a solution, which can sort
half of the array; The next question is, how to deal with the rest of the unsorted part
stored in the working area as shown in figure 13.13?
One intuitive idea is to recursively sort another half of the working area, thus there
are only 14 elements haven’t been sorted yet. Which is shown in figure 13.14. The key
point at this stage is that we must merge the sorted 14 elements B with the sorted 21
elements A sooner or later.
Is the working area left, which only holds 14 elements, big enough for merging A and
B? Unfortunately, it isn’t in the settings shown in figure 13.14.
However, the second constraint mentioned before gives us a hint, that we can exploit
it by arranging the working area to overlap with either sub array if we can ensure the
unmerged elements won’t be overwritten under some well designed merging schema.
13.9. IN-PLACE MERGE SORT 351
unsorted 1/4 sorted B 1/4 ... ... sorted A 1/2 ... ...
Actually, instead of making the second half of the working area be sorted, we can
make the first half be sorted, and put the working area between the two sorted arrays as
shown in figure 13.15 (a). This setup effects arranging the working area to overlap with
the sub array A. This idea is proposed in [74].
sorted B 1/4 work area ... ... sorted A 1/2 ... ...
(a)
work area 1/4 ... ... ... ... merged 3/4 ... ... ... ...
(b)
1. All the elements in B are less than any element in A. In this case, the merge
algorithm finally moves the whole contents of B to the working area; the cells of B
holds what previously stored in the working area; As the size of area is as same as
B, it’s OK to exchange their contents;
2. All the elements in A are less than any element in B. In this case, the merge
algorithm continuously exchanges elements between A and the working area. After
all the previous 14 cells in the working area are filled with elements from A, the
algorithm starts to overwrite the first half of A. Fortunately, the contents being
overwritten are not those unmerged elements. The working area is in effect advances
toward the end of the array, and finally moves to the right side; From this time point,
the merge algorithm starts exchanging contents in B with the working area. The
result is that the working area moves to the left most side which is shown in figure
13.15 (b).
We can repeat this step, that always sort the second half of the unsorted part, and
exchange the sorted sub array to the first half as working area. Thus we keep reducing
the working area from 12 of the array, 14 of the array, 18 of the array, ... The scale of
the merge problem keeps reducing. When there is only one element left in the working
area, we needn’t sort it any more since the singleton array is sorted by nature. Merging a
singleton array to the other is equivalent to insert the element. In practice, the algorithm
can finalize the last few elements by switching to insertion sort.
The whole algorithm can be described as the following.
1: procedure Sort(A, l, u)
2: if u − l > 0 then
3: 2 c
m ← b l+u
4: w ←l+u−m
5: Sort’(A, l, m, w) ▷ The second half contains sorted elements
352 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
6: while w − l > 1 do
7: u′ ← w
′
8: w ← d l+u
2 e ▷ Ensure the working area is big enough
9: Sort’(A, w, u′ , l) ▷ The first half holds the sorted elements
10: Merge(A, [l, l + u′ − w], [u′ , u], w)
11: for i ← w down-to l do ▷ Switch to insertion sort
12: j←i
13: while j ≤ u ∧ A[j] < A[j − 1] do
14: Exchange A[j] ↔ A[j − 1]
15: j ←j+1
Note that in order to satisfy the first constraint, we must ensure the working area is
big enough to hold all exchanged in elements, that’s way we round it by ceiling when sort
the second half of the working area. Note that we actually pass the ranges including the
end points to the algorithm Merge.
Next, we develop a Sort’ algorithm, which mutually recursive call Sort and exchange
the result to the working area.
1: procedure Sort’(A, l, u, w)
2: if u − l > 0 then
3: 2 c
m ← b l+u
4: Sort(A, l, m)
5: Sort(A, m + 1, u)
6: Merge(A, [l, m], [m + 1, u], w)
7: else ▷ Exchange all elements to the working area
8: while l ≤ u do
9: Exchange A[l] ↔ A[w]
10: l ←l+1
11: w ←w+1
Different from the naive in-place sort, this algorithm doesn’t shift the array during
n n n
merging. The main algorithm reduces the unsorted part in sequence of , , , ..., it
2 4 8
takes O(lg n) steps to complete sorting. In every step, It recursively sorts half of the rest
elements, and performs linear time merging.
Denote the time cost of sorting n elements as T (n), we have the following equation.
n n n 3n n 7n
T (n) = T ( ) + c + T ( ) + c + T( ) + c + ... (13.34)
2 2 4 4 8 8
substitute n with its half, we get another one:
n n n n 3n n 7n
T( ) = T( ) + c + T( ) + c + T( ) + c + ... (13.35)
2 4 4 8 8 16 16
Substract (13.34) and (13.35) we have:
n n 1 1
T (n) − T ( ) = T ( ) + cn( + + ...)
2 2 2 2
1
There are total lg n times added together, therefore, the recursive time can be
2
expressed as:
1
T (n) = 2T ( ) + cn lg n
2
Solving this equation by using telescope method, gets the result O(n lg2 n).
The following ANSI C code completes the implementation by using the example
wmerge program given above.
13.9. IN-PLACE MERGE SORT 353
However, this program doesn’t run faster than the version we developed in previous
section, which doubles the array in advance as working area. In my machine, it is about
60% slower when sorting 100,000 random numbers due to many swap operations.
We can define an auxiliary function for node linking. Assume the list to be linked
isn’t empty, it can be implemented as the following.
struct Node∗ link(struct Node∗ x, struct Node∗ ys) {
x→next = ys;
return x;
}
354 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
One method to realize the imperative even-odd splitting, is to initialize two empty
sub lists. Then iterate the list to be split. Every time, we link the current node in front
of the first sub list, then exchange the two sub lists. So that, the second sub list will be
linked at the next time iteration. This idea can be illustrated as below.
1: function Split(L)
2: (A, B) ← (ϕ, ϕ)
3: while L 6= ϕ do
4: p←L
5: L ← Next(L)
6: A ← Link(p, A)
7: Exchange A ↔ B
8: return (A, B)
The following example ANSI C program implements this splitting algorithm embed-
ded.
struct Node∗ msort(struct Node∗ xs) {
struct Node ∗p, ∗as, ∗bs;
if (!xs | | !xs→next) return xs;
as = bs = NULL;
while(xs) {
p = xs;
xs = xs→next;
as = link(p, as);
swap(as, bs);
}
as = msort(as);
bs = msort(bs);
return merge(as, bs);
}
The only thing left is to develop the imperative merging algorithm for linked-list. The
idea is quite similar to the array merging version. As long as neither of the sub lists is
exhausted, we pick the less one, and append it to the result list. After that, it just need
link the non-empty one to the tail the result, but not a looping for copying. It needs some
carefulness to initialize the result list, as its head node is the less one among the two sub
lists. One simple method is to use a dummy sentinel head, and drop it before returning.
This implementation detail can be given as the following.
struct Node∗ merge(struct Node∗ as, struct Node∗ bs) {
struct Node s, ∗p;
p = &s;
while (as && bs) {
if (as→key < bs→key) {
link(p, as);
as = as→next;
}
else {
link(p, bs);
bs = bs→next;
}
p = p→next;
}
if (as)
link(p, as);
if (bs)
link(p, bs);
return [Link];
}
13.10. NATURE MERGE SORT 355
Exercise 13.5
For any given sequence, we can always find a non-decreasing sub sequence starts at
any position. One particular case is that we can find such a sub sequence from the left-
most position. The following table list some examples, the non-decreasing sub sequences
are in bold font.
15 , 0, 4, 3, 5, 2, 7, 1, 12, 14, 13, 8, 9, 6, 10, 11
8, 12, 14 , 0, 1, 4, 11, 2, 3, 5, 9, 13, 10, 6, 15, 7
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
The first row in the table illustrates the worst case, that the second element is less than
the first one, so the non-decreasing sub sequence is a singleton list, which only contains
the first element; The last row shows the best case, the the sequence is ordered, and the
non-decreasing list is the whole; The second row shows the average case.
Symmetrically, we can always find a non-decreasing sub sequence from the end of the
sequence to the left. This indicates us that we can merge the two non-decreasing sub
sequences, one from the beginning, the other form the ending to a longer sorted sequence.
The advantage of this idea is that, we utilize the nature ordered sub sequences, so that
we needn’t recursive sorting at all.
merge merge
Figure 13.17 illustrates this idea. We starts the algorithm by scanning from both ends,
finding the longest non-decreasing sub sequences respectively. After that, these two sub
356 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
sequences are merged to the working area. The merged result starts from beginning. Next
we repeat this step, which goes on scanning toward the center of the original sequence.
This time we merge the two ordered sub sequences to the right hand of the working area
toward the left. Such setup is easy for the next round of scanning. When all the elements
in the original sequence have been scanned and merged to the target, we switch to use
the elements stored in the working area for sorting, and use the previous sequence as new
working area. Such switching happens repeatedly in each round. Finally, we copy all
elements from the working area to the original array if necessary.
The only question left is when this algorithm stops. The answer is that when we start
a new round of scanning, and find that the longest non-decreasing sub list spans to the
end, which means the whole list is ordered, the sorting is done.
Because this kind of merge sort proceeds the target sequence in two ways, and uses
the nature ordering of sub sequences, it’s named nature two-way merge sort. In order to
realize it, some carefulness must be paid. Figure 13.18 shows the invariant during the
nature merge sort. At anytime, all elements before marker a and after marker d have been
already scanned and merged. We are trying to span the non-decreasing sub sequence [a, b)
as long as possible, at the same time, we span the sub sequence from right to left to span
[c, d) as long as possible as well. The invariant for the working area is shown in the second
row. All elements before f and after r have already been processed (Note that they may
contain several ordered sub sequences). For the odd times (1, 3, 5, ...), we merge [a, b)
and [c, d) from f toword right; while for the even times (2, 4, 6, ...), we merge the two
sorted sub sequences after r toward left.
a b c d
... scanned ... ... span [a, b) ... ... ? ... ... span [c, d) ... ... scanned ...
f r
... merged ... ... unused free cells ... ... merged ...
For imperative realization, the sequence is represented by array. Before sorting starts,
we duplicate the array to create a working area. The pointers a, b are initialized to point
the left most position, while c, d point to the right most position. Pointer f starts by
pointing to the front of the working area, and r points to the rear position.
1: function Sort(A)
2: if |A| > 1 then
3: n ← |A|
4: B ← Create-Array(n) ▷ Create the working area
5: loop
13.10. NATURE MERGE SORT 357
6: [a, b) ← [1, 1)
7: [c, d) ← [n + 1, n + 1)
8: f ← 1, r ← n ▷ front and rear pointers to the working area
9: t ← False ▷ merge to front or rear
10: while b < c do ▷ There are still elements for scan
11: repeat ▷ Span [a, b)
12: b←b+1
13: until b ≥ c ∨ A[b] < A[b − 1]
14: repeat ▷ Span [c, d)
15: c←c−1
16: until c ≤ b ∨ A[c − 1] < A[c]
17: if c < b then ▷ Avoid overlap
18: c←b
19: if b − a ≥ n then ▷ Done if [a, b) spans to the whole array
20: return A
21: if t then ▷ merge to front
22: f ← Merge(A, [a, b), [c, d), B, f, 1)
23: else ▷ merge to rear
24: r ← Merge(A, [a, b), [c, d), B, r, −1)
25: a ← b, d ← c
26: t ← ¬t ▷ Switch the merge direction
27: Exchange A ↔ B ▷ Switch working area
28: return A
The merge algorithm is almost as same as before except that we need pass a parameter
to indicate the direction for merging.
1: function Merge(A, [a, b), [c, d), B, w, ∆)
2: while a < b ∧ c < d do
3: if A[a] < A[d − 1] then
4: B[w] ← A[a]
5: a←a+1
6: else
7: B[w] ← A[d − 1]
8: d←d−1
9: w ←w+∆
10: while a < b do
11: B[w] ← A[a]
12: a←a+1
13: w ←w+∆
14: while c < d do
15: B[w] ← A[d − 1]
16: d←d−1
17: w ←w+∆
18: return w
The following ANSI C program implements this two-way nature merge sort algorithm.
Note that it doesn’t release the allocated working area explicitly.
int merge(Key∗ xs, int a, int b, int c, int d, Key∗ ys, int k, int delta) {
for(; a < b && c < d; k += delta )
ys[k] = xs[a] < xs[d-1] ? xs[a++] : xs[--d];
for(; a < b; k += delta)
ys[k] = xs[a++];
for(; c < d; k += delta)
358 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
ys[k] = xs[--d];
return k;
}
The performance of nature merge sort depends on the actual ordering of the sub arrays.
However, it in fact performs well even in the worst case. Suppose that we are unlucky
when scanning the array, that the length of the non-decreasing sub arrays are always 1
during the first round scan. This leads to the result working area with merged ordered
sub arrays of length 2. Suppose that we are unlucky again in the second round of scan,
however, the previous results ensure that the non-decreasing sub arrays in this round are
no shorter than 2, this time, the working area will be filled with merged ordered sub arrays
of length 4, ... Repeat this we get the length of the non-decreasing sub arrays doubled
in every round, so there are at most O(lg n) rounds, and in every round we scanned all
the elements. The overall performance for this worst case is bound to O(n lg n). We’ll go
back to this interesting phenomena in the next section about bottom-up merge sort.
In purely functional settings however, it’s not sensible to scan list from both ends since
the underlying data structure is singly linked-list. The nature merge sort can be realized
in another approach.
Observe that the list to be sorted is consist of several non-decreasing sub lists, that we
can pick every two of such sub lists and merge them to a bigger one. We repeatedly pick
and merge, so that the number of the non-decreasing sub lists halves continuously and
finally there is only one such list, which is the sorted result. This idea can be formalized
in the following equation.
sort(L) = sort′ (group(L)) (13.36)
Where function group(L) groups the elements in the list into non-decreasing sub lists.
13.10. NATURE MERGE SORT 359
This function can be described like below, the first two are trivial edge cases.
• If the list is empty, the result is a list contains an empty list;
• If there is only one element in the list, the result is a list contains a singleton list;
• Otherwise, The first two elements are compared, if the first one is less than or equal
to the second, it is linked in front of the first sub list of the recursive grouping
result; or a singleton list contains the first element is set as the first sub list before
the recursive result.
{L} : |L| ≤ 1
group(L) = {{l1 } ∪ L1 , L2 , ...} : l1 ≤ l2 , {L1 , L2 , ...} = group(L′ ) (13.37)
{{l1 }, L1 , L2 , ...} : otherwise
It’s quite possible to abstract the grouping criteria as a parameter to develop a generic
grouping function, for instance, as the following Haskell code 6 .
groupBy' :: (a→a→Bool) →[a] →[[a]]
groupBy' _ [] = [[]]
groupBy' _ [x] = [[x]]
groupBy' f (x:xs@(x':_)) | f x x' = (x:ys):yss
| otherwise = [x]:r
where
r@(ys:yss) = groupBy' f xs
Different from the sort function, which sorts a list of elements, function sort′ accepts
a list of sub lists which is the result of grouping.
ϕ : L=ϕ
sort′ (L) = L1 : L = {L1 } (13.38)
sort′ (mergeP airs(L)) : otherwise
The first two are the trivial edge cases. If the list to be sorted is empty, the result is
obviously empty; If it contains only one sub list, then we are done. We need just extract
this single sub list as result; For the recursive case, we call a function mergeP airs to
merge every two sub lists, then recursively call sort′ .
The next undefined function is mergeP airs, as the name indicates, it repeatedly
merges pairs of non-decreasing sub lists into bigger ones.
{
L : |L| ≤ 1
mergeP airs(L) = (13.39)
{merge(L1 , L2 )} ∪ mergeP airs(L′′ ) : otherwise
When there are less than two sub lists in the list, we are done; otherwise, we merge
the first two sub lists L1 and L2 , and recursively merge the rest of pairs in L′′ . The type
of the result of mergeP airs is list of lists, however, it will be flattened by sort′ function
finally.
The merge function is as same as before. The complete example Haskell program is
given as below.
mergesort = sort' ◦ groupBy' ( ≤ )
sort' [] = []
sort' [xs] = xs
sort' xss = sort' (mergePairs xss) where
mergePairs (xs:ys:xss) = merge xs ys : mergePairs xss
mergePairs xss = xss
6 There is a ‘groupBy’ function provided in the Haskell standard library ’[Link]’. However, it doesn’t
fit here, because it accepts an equality testing function as parameter, which must satisfy the properties
of reflexive, transitive, and symmetric. but what we use here, the less-than or equal to operation doesn’t
conform to symetric. Refer to appendix A of this book for detail.
360 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
Alternatively, observing that we can first pick two sub lists, merge them to an inter-
mediate result, then repeatedly pick next sub list, and merge to this ordered result we’ve
gotten so far until all the rest sub lists are merged. This is a typical folding algorithm as
introduced in appendix A.
Exercise 13.6
• Is the nature merge sort algorithm realized by folding is equivalent with the one by
using mergeP airs in terms of performance? If yes, prove it; If not, which one is
faster?
...
... ...
...
Different with the basic version and even-odd version, we needn’t explicitly split the
list to be sorted in every recursion. The whole list is split into n singletons at the very
beginning, and we merge these sub lists in the rest of the algorithm.
We reuse the function sort′ and mergeP airs which are defined in section of nature
merge sort. They repeatedly merge pairs of sub lists until there is only one.
Implement this version in Haskell gives the following example code.
sort = sort' ◦ map (λx→[x])
This version is based on what Okasaki presented in [3]. It is quite similar to the nature
merge sort only differs in the way of grouping. Actually, it can be deduced as a special
case (the worst case) of nature merge sort by the following equation.
That instead of spanning the non-decreasing sub list as long as possible, the predicate
always evaluates to false, so the sub list spans only one element.
Similar with nature merge sort, bottom-up merge sort can also be defined by folding.
The detailed implementation is left as exercise to the reader.
Observing the bottom-up sort, we can find it’s in tail-recursion call manner, thus it’s
quite easy to translate into purely iterative algorithm without any recursion.
1: function Sort(A)
2: B←ϕ
3: for ∀a ∈ A do
4: B ← Append({a})
5: N ← |B|
6: while N > 1 do
7: for i ← from 1 to b N2 c do
8: B[i] ← Merge(B[2i − 1], B[2i])
9: if Odd(N ) then
10: B[d N2 e] ← B[N ]
11: N ← d N2 e
12: if B = ϕ then
13: return ϕ
14: return B[1]
The following example Python program implements the purely iterative bottom-up
merge sort.
def mergesort(xs):
ys = [[x] for x in xs]
while len(ys) > 1:
[Link](merge([Link](0), [Link](0)))
return [] if ys == [] else [Link]()
zs = []
while xs ̸= [] and ys ̸= []:
[Link]([Link](0) if xs[0] < ys[0] else [Link](0))
return zs + (xs if xs ̸= [] else ys)
Exercise 13.7
13.12 Parallelism
We mentioned in the basic version of quick sort, that the two sub sequences can be sorted
in parallel after the divide phase finished. This strategy is also applicable for merge
sort. Actually, the parallel version quick sort and morege sort, do not only distribute the
recursive sub sequences sorting into two parallel processes, but divide the sequences into
p sub sequences, where p is the number of processors. Idealy, if we can achieve sorting in
T ′ time with parallelism, which satisifies O(n lg n) = pT ′ . We say it is linear speed up,
and the algorithm is parallel optimal.
However, a straightforward parallel extension to the sequential quick sort algorithm
which samples several pivots, divides p sub sequences, and independently sorts them in
parallel, isn’t optimal. The bottleneck exists in the divide phase, which we can only
achieve O(n) time in average case.
The straightforward parallel extension to merge sort, on the other hand, block at the
merge phase. Both parallel merge sort and quick sort in practice need good designs in
order to achieve the optimal speed up. Actually, the divide and conquer nature makes
merge sort and quick sort relative easy for parallelisim. Richard Cole found the O(lg n)
parallel merge sort algorithm with n processors in 1986 in [76].
Parallelism is a big and complex topic which is out of the scope of this elementary
book. Readers can refer to [76] and [77] for details.
purely functional settings, swapping isn’t the most efficient way due to the underlying data
structure is singly linked-list, but not vectorized array. Merge sort, on the other hand,
is friendly in such environment, as it costs constant spaces, and the performance can be
ensured even in the worst case of quick sort, while the latter downgrade to quadratic time.
However, merge sort doesn’t performs as well as quick sort in purely imperative settings
with arrays. It either needs extra spaces for merging, which is sometimes unreasonable,
for example in embedded system with limited memory, or causes many overhead swaps
by in-place workaround. In-place merging is till an active research area.
Although the title of this chapter is ‘quick sort vs. merge sort’, it’s not the case
that one algorithm has nothing to do with the other. Quick sort can be viewed as the
optimized version of tree sort as explained in this chapter. Similarly, merge sort can also
be deduced from tree sort as shown in [75].
There are many ways to categorize sorting algorithms, such as in [51]. One way is to
from the point of view of easy/hard partition, and easy/hard merge [72].
Quick sort, for example, is quite easy for merging, because all the elements in the sub
sequence before the pivot are no greater than any one after the pivot. The merging for
quick sort is actually trivial sequence concatenation.
Merge sort, on the other hand, is more complex in merging than quick sort. However,
it’s quite easy to divide no matter what concrete divide method is taken: simple divide
at the middle point, even-odd splitting, nature splitting, or bottom-up straight splitting.
Compare to merge sort, it’s more difficult for quick sort to achieve a perfect dividing.
We show that in theory, the worst case can’t be completely avoided, no matter what
engineering practice is taken, median-of-three, random quick sort, 3-way partition etc.
We’ve shown some elementary sorting algorithms in this book till this chapter, includ-
ing insertion sort, tree sort, selection sort, heap sort, quick sort and merge sort. Sorting
is still a hot research area in computer science. At the time when this chapter is written,
people are challenged by the buzz word ‘big data’, that the traditional convenient method
can’t handle more and more huge data within reasonable time and resources. Sorting a
sequence of hundreds of Gigabytes becomes a routine in some fields.
Exercise 13.8
• Design an algorithm to create binary search tree by using merge sort strategy.
364 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
Bibliography
[1] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. ISBN:0262032937. The MIT Press. 2001
[5] Jon Bentley, Douglas McIlroy. “Engineering a sort function”. Software Practice and
experience VOL. 23(11), 1249-1265 1993.
[7] Richard Bird. “Pearls of functional algorithm design”. Cambridge University Press.
2010. ISBN, 1139490605, 9781139490603
[8] Fethi Rabhi, Guy Lapalme. “Algorithms: a functional programming approach”. Sec-
ond edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
[10] Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. “Practical in-place mergesort”.
Nordic Journal of Computing, 1996.
[11] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[12] Josè Bacelar Almeida and Jorge Sousa Pinto. “Deriving Sorting Algorithms”. Tech-
nical report, Data structures and Algorithms. 2008.
[13] Cole, Richard (August 1988). “Parallel merge sort”. SIAM J. Comput. 17 (4):
770¨C785. doi:10.1137/0217049. (August 1988)
[14] Powers, David M. W. “Parallelized Quicksort and Radixsort with Optimal Speedup”,
Proceedings of International Conference on Parallel Computing Technologies. Novosi-
birsk. 1991.
365
366 Searching
Searching
14.1 Introduction
Searching is quite a big and important area. Computer makes many hard searching
problems realistic. They are almost impossible for human beings. A modern industry
robot can even search and pick the correct gadget from the pipeline for assembly; A GPS
car navigator can search among the map, for the best route to a specific place. The
modern mobile phone is not only equipped with such map navigator, but it can also
search for the best price for Internet shopping.
This chapter just scratches the surface of elementary searching. One good thing that
computer offers is the brute-force scanning for a certain result in a large sequence. The
divide and conquer search strategy will be briefed with two problems, one is to find the
k-th big one among a list of unsorted elements; the other is the popular binary search
among a list of sorted elements. We’ll also introduce the extension of binary search for
multiple-dimension data.
Text matching is also very important in our daily life, two well-known searching al-
gorithms, Knuth-Morris-Pratt (KMP) and Boyer-Moore algorithms will be introduced.
They set good examples for another searching strategy: information reusing.
Besides sequence search, some elementary methods for searching solution for some
interesting problems will be introduced. They were mostly well studied in the early
phase of AI (artificial intelligence), including the basic DFS (Depth first search), and
BFS (Breadth first search).
Finally, Dynamic programming will be briefed for searching optimal solutions, and
we’ll also introduce about greedy algorithm which is applicable for some special cases.
All algorithms will be realized in both imperative and functional approaches.
367
368 CHAPTER 14. SEARCHING
k-selection problem
Consider a problem of finding the k-th smallest one among n elements. The most straight-
forward idea is to find the minimum first, then drop it and find the second minimum
element among the rest. Repeat this minimum finding and dropping k steps will give the
k-th smallest one. Finding the minimum among n elements costs linear O(n) time. Thus
this method performs O(kn) time, if k is much smaller than n.
Another method is to use the ‘heap’ data structure we’ve introduced. No matter what
concrete heap is used, e.g. binary heap with implicit array, Fibonacci heap or others,
Accessing the top element followed by popping is typically bound O(lg n) time. Thus this
method, as formalized in equation (14.1) and (14.2) performs in O(k lg n) time, if k is
much smaller than n.
However, heap adds some complexity to the solution. Is there any simple, fast method
to find the k-th element?
The divide and conquer strategy can help us. If we can divide all the elements into
two sub lists A and B, and ensure all the elements in A is not greater than any elements
in B, we can scale down the problem by following this method1 :
2. If k < |A|, the k-th smallest one must be contained in A, we can drop B and further
search in A;
3. If |A| < k, the k-th smallest one must be contained in B, we can drop A and further
search the (k − |A|)-th smallest one in B.
Note that the italic font emphasizes the fact of recursion. The ideal case always divides
the list into two equally big sub lists A and B, so that we can halve the problem each
time. Such ideal case leads to a performance of O(n) linear time.
Thus the key problem is how to realize dividing, which collects the first m smallest
elements in one sub list, and put the rest in another.
This reminds us the partition algorithm in quick sort, which moves all the elements
smaller than the pivot in front of it, and moves those greater than the pivot behind it.
Based on this idea, we can develop a divide and conquer k-selection algorithm, which is
called quick selection algorithm.
2. Moves all elements which aren’t greater than the pivot in a sub list A; and moves
the rest to sub list B;
3. Compare the length of A with k, if |A| = k − 1, then the pivot is the k-th smallest
one;
This algorithm can be formalized in below equation. Suppose 0 < k ≤ |L| , where L
is a non-empty list of elements. Denote l1 as the first element in L. It is chosen as the
pivot; L′ contains the rest elements except for l1 . (A, B) = partition(λx · x ≤ l1 , L′ ). It
partitions L′ by using the same algorithm defined in the chapter of quick sort.
l1 : |A| = k − 1
top(k, L) = top(k − 1 − |A|, B) : |A| < k − 1 (14.3)
top(k, A) : otherwise
(ϕ, ϕ) : L = ϕ
partition(p, L) = ({l1 } ∪ A, B) : p(l1 ), (A, B) = partition(p, L′ ) (14.4)
(A, {l1 } ∪ B) : ¬p(l1 )
The partition function is provided in Haskell standard library, the detailed implemen-
tation can be referred to previous chapter about quick sort.
The lucky case is that, the k-th smallest element is selected as the pivot at the very
beginning. The partition function examines the whole list, and finds that there are k − 1
elements not greater than the pivot, we are done in just O(n) time. The worst case is that
either the maximum or the minimum element is selected as the pivot every time. The
partition always produces an empty sub list, that either A or B is empty. If we always
pick the minimum as the pivot, the performance is bound to O(kn). If we always pick
the maximum as the pivot, the performance is O((n − k)n).
The best case (not the lucky case), is that the pivot always partition the list perfectly.
The length of A is nearly as same as the length of B. The list is halved every time. It
needs about O(lg n) partitions, each partition takes linear time proportion to the length
of the halved list. This can be expressed as O(n + n2 + n4 + ... + 2nm ), where m is the
smallest number satisfies 2nm < k. Summing the series leads to the result of O(n).
The average case analysis needs tool of mathematical expectation. It’s quite similar
to the proof given in previous chapter of quick sort. It’s left as an exercise to the reader.
Similar as quick sort, this divide and conquer selection algorithm performs well most
time in practice. We can take the same engineering practice such as media-of-three, or
randomly select the pivot as we did for quick sort. Below is the imperative realization for
example.
1: function Top(k, A, l, u)
2: Exchange A[l] ↔ A[ Random(l, u) ] ▷ Randomly select in [l, u]
3: p ← Partition(A, l, u)
4: if p − l + 1 = k then
5: return A[p]
6: if k < p − l + 1 then
7: return Top(k, A, l, p − 1)
8: return Top(k − p + l − 1, A, p + 1, u)
This algorithm searches the k-th smallest element in range of [l, u] for array A. The
boundaries are included. It first randomly selects a position, and swaps it with the first
370 CHAPTER 14. SEARCHING
one. Then this element is chosen as the pivot for partitioning. The partition algorithm
in-place moves elements and returns the position where the pivot being moved. If the
pivot is just located at position k, then we are done; if there are more than k − 1 elements
not greater than the pivot, the algorithm recursively searches the k-th smallest one in
range [l, p − 1]; otherwise, k is deduced by the number of elements before the pivot, and
recursively searches the range after the pivot [p + 1, u].
There are many methods to realize the partition algorithm, below one is based on N.
Lumoto’s method. Other realizations are left as exercises to the reader.
1: function Partition(A, l, u)
2: p ← A[l]
3: L←l
4: for R ← l + 1 to u do
5: if ¬(p < A[R]) then
6: L←L+1
7: Exchange A[L] ↔ A[R]
8: Exchange A[L] ↔ p
9: return L
Below ANSI C example program implements this algorithm. Note that it handles the
special case that either the array is empty, or k is out of the boundaries of the array. It
returns -1 to indicate the search failure.
int partition(Key∗ xs, int l, int u) {
int r, p = l;
for (r = l + 1; r < u; ++r)
if (!(xs[p] < xs[r]))
swap(xs, ++l, r);
swap(xs, p, l);
return l;
}
There is a method proposed by Blum, Floyd, Pratt, Rivest and Tarjan in 1973, which
ensures the worst case performance being bound to O(n) [4], [81]. It divides the list into
small groups. Each group contains no more than 5 elements. The median of each group
among these 5 elements are identified quickly. Then there are n5 median elements selected.
We repeat this step, and divide them again into groups of 5, and recursively select the
median of median. It’s obviously that the final ‘true’ median can be found in O(lg n)
time. This is the best pivot for partitioning the list. Next, we halve the list by this pivot
and recursively search for the k-th smallest one. The performance can be calculated as
the following.
n
T (n) = c1 lgn + c2 n + T ( ) (14.5)
2
Where c1 and c2 are constant factors for the median of median and partition compu-
tation respectively. Solving this equation with telescope method or the master theory in
14.2. SEQUENCE SEARCH 371
binary search
Another popular divide and conquer algorithm is binary search. We’ve shown it in the
chapter about insertion sort. When I was in school, the teacher who taught math played
a magic to me, He asked me to consider a natural number less than 1000. Then he asked
me some questions, I only replied ‘yes’ or ‘no’, and finally he guessed my number. He
typically asked questions like the following:
• Is it an even number?
• Is it a prime number?
• Can it be divided by 3?
• ...
Most of the time he guessed the number within 10 questions. My classmates and I all
thought it’s unbelievable.
This game will not be so interesting if it downgrades to a popular TV program, that
the price of a product is hidden, and you must figure out the exact price in 30 seconds.
The host of the program tells you if your guess is higher or lower to the fact. If you win,
the product is yours. The best strategy is to use similar divide and conquer approach to
perform a binary search. So it’s common to find such conversation between the player
and the host:
• P: 1000;
• H: High;
• P: 500;
• H: Low;
• P: 750;
372 CHAPTER 14. SEARCHING
• H: Low;
• P: 890;
• H: Low;
• P: 990;
• H: Bingo.
My math teacher told us that, because the number we considered is within 1000, if he
can halve the numbers every time by designing good questions, the number will be found
in 10 questions. This is because 210 = 1024 > 1000. However, it would be boring to just
ask it is higher than 500, is lower than 250, ... Actually, the question ‘is it even’ is very
good, because it always halve the numbers2 .
Come back to the binary search algorithm. It is only applicable to a sequence of
ordered number. I’ve seen programmers tried to apply it to unsorted array, and took
several hours to figure out why it doesn’t work. The idea is quite straightforward, in
order to find a number x in an ordered sequence A, we firstly check middle point number,
compare it with x, if they are same, then we are done; If x is smaller, as A is ordered,
we need only recursively search it among the first half; otherwise we search it among the
second half. Once A gets empty and we haven’t found x yet, it means x doesn’t exist.
Before formalizing this algorithm, there is a surprising fact need to be noted. Donald
Knuth stated that ‘Although the basic idea of binary search is comparatively straightfor-
ward, the details can be surprisingly tricky¡’. Jon Bentley pointed out that most binary
search implementation contains errors, and even the one given by him in the first version
of ‘Programming pearls’ contains an error undetected over twenty years [4].
There are two kinds of realization, one is recursive, the other is iterative. The recursive
solution is as same as what we described. Suppose the lower and upper boundaries of the
array are l and u inclusive.
1: function Binary-Search(x, A, l, u)
2: if u < l then
3: Not found error
4: else
5: 2 c
m ← l + b u−l ▷ avoid overflow of b l+u
2 c
6: if A[m] = x then
7: return m
8: if x < A[m] then
9: return Binary-Search(x, A, l, m - 1)
10: else
11: return Binary-Search(x, A, m + 1, u)
As the comment highlights, if the integer is represented with limited words, we can’t
merely use b l+u
2 c because it may cause overflow if l and u are big.
Binary search can also be realized in iterative manner, that we keep updating the
boundaries according to the middle point comparison result.
1: function Binary-Search(x, A, l, u)
2: while l < u do
3: m ← l + b u−l
2 c
4: if A[m] = x then
5: return m
2 When the author revise this chapter, Microsoft released a game in social networks. User can consider
a person’s name, the AI robot asks 16 questions next. The user only answers with yes or no. The robot
will tell you who is that person. Can you figure out how the robot works?
14.2. SEQUENCE SEARCH 373
Where b1 is the first element if B isn’t empty, and B ′ holds the rest except for b1 .
The splitAt function takes O(n) time to divide the list into two subs A and B (see the
appendix A, and the chapter about merge sort for detail). If B isn’t empty and x is equal
to b1 , the search returns; Otherwise if it is less than b1 , as the list is sorted, we need
recursively search in A, otherwise, we search in B. If the list is empty, we raise error to
indicate search failure.
As we always split the list in the middle point, the number of elements halves in each
recursion. In every recursive call, we takes linear time for splitting. The splitting function
only traverses the first half of the linked-list, Thus the total time can be expressed as.
n n n
T (n) = c + c + c + ...
2 4 8
This results O(n) time, which is as same as the brute force search from head to tail:
Err : L = ϕ
search(x, L) = l1 : x = l1
search(x, L′ ) : otherwise
As we mentioned in the chapter about insertion sort, the functional approach of binary
search is through binary search tree. That the ordered sequence is represented in a tree
( self balanced tree if necessary), which offers logarithm time searching 3 .
Although it doesn’t make sense to apply divide and conquer binary search on linked-
list, binary search can still be very useful in purely functional settings. Consider solving
an equation ax = y, for given natural numbers a and y, where a ≤ y. We want to find
the integer solution for x if there is. Of course brute-force naive searching can solve it.
We can examine all numbers one by one from 0 for a0 , a1 , a2 , ..., stops if ai = y or report
that there is no solution if ai < y < ai+1 for some i. We initialize the solution domain as
X = {0, 1, 2, ...}, and call the below exhausted searching function solve(a, y, X).
x1 : ax1 = y
solve(a, y, X) = solve(a, y, X ′ ) : ax1 < y
Err : otherwise
3 Some readers may argue that array should be used instead of linked-list, for example in Haskell. This
book only deals with purely functional sequences in finger-tree. Different from the Haskell array, it can’t
support constant time random accessing
374 CHAPTER 14. SEARCHING
This function examines the solution domain in monotonic increasing order. It takes
the first candidate element x1 from X, compare ax1 and y, if they are equal, then x1 is the
solution and we are done; if it is less than y, then x1 is dropped, and we search among the
rest elements represented as X ′ ; Otherwise, since f (x) = ax is non-decreasing function
when a is natural number, so the rest elements will only make f (x) bigger and bigger.
There is no integer solution for this equation. The function returns error to indicate no
solution.
The computation of ax is expensive for big a and x if precession must be kept4 . Can it
be improved so that we can compute as less as possible? The divide and conquer binary
search can help. Actually, we can estimate the upper limit of the solution domain. As
ay ≤ y, We can search in range {0, 1, ..., y}. As the function f (x) = ax is non-decreasing
against its argument x, we can firstly check the middle point candidate xm = b 0+y 2 c, if
axm = y, the solution is found; if it is less than y, we can drop all candidate solutions before
xm ; otherwise we drop all candidate solutions after it; Both halve the solution domain.
We repeat this approach until either the solution is found or the solution domain becomes
empty, which indicates there is no integer solution.
The binary search method can be formalized as the following equation. The non-
decreasing function is abstracted as a parameter. To solve our problem, we can just call
it as bsearch(f, y, 0, y), where f (x) = ax .
Err : u < l
m : f (m) = y, m = b l+u 2 c
bsearch(f, y, l, u) = (14.7)
bsearch(f, y, l, m − 1) : f (m) > y
bsearch(f, y, m + 1, u) : f (m) < y
As we halve the solution domain in every recursion, this method computes f (x) in
O(log y) times. It is much faster than the brute-force searching.
2 dimensions search
It’s quite natural to think that the idea of binary search can be extended to 2 dimensions
or even more general – multiple-dimensions domain. However, it is not so easy.
Consider the example of a m × n matrix M . The elements in each row and each
column are in strict increasing order. Figure 14.1 illustrates such a matrix for example.
1 2 3 4 ...
2 4 5 6 ...
3 5 7 8 ...
4 6 8 9 ...
...
Figure 14.1: A matrix in strict increasing order for each row and column.
Given a value x, how to locate all elements equal to x in the matrix quickly? We need
develop an algorithm, which returns a list of locations (i, j) so that Mi,j = x.
Richard Bird in [1] mentioned that he used this problem to interview candidates
for entry to Oxford. The interesting story was that, those who had some computer
background at school tended to use binary search. But it’s easy to get stuck.
The usual way follows binary search idea is to examine element at M m2 , n2 . If it is less
than x, we can only drop the elements in the top-left area; If it is greater than x, only
4 One alternative is to reuse the result of an when compute an+1 = aan . Here we consider for general
the bottom-right area can be dropped. Both cases are illustrated in figure 14.2, the gray
areas indicate elements can be dropped.
Figure 14.2: Left: the middle point element is smaller than x. All elements in the gray
area are less than x; Right: the middle point element is greater than x. All elements in
the gray area are greater than x.
The problem is that the solution domain changes from a rectangle to a ’L’ shape in
both cases. We can’t just recursively apply search on it. In order to solve this problem
systematically, we define the problem more generally, using brute-force search as a start
point, and keep improving it bit by bit.
Consider a function f (x, y), which is strict increasing for its arguments, for instance
f (x, y) = ax + by , where a and b are natural numbers. Given a value z, which is a natural
number too, we want to solve the equation f (x, y) = z by finding all none negative integral
candidate pairs (x, y).
With this definition, the matrix search problem can be specialized by below function.
{
Mx,y : 1 ≤ x ≤ m, 1 ≤ y ≤ n
f (x, y) =
−1 : otherwise
Brute-force 2D search
As all solutions should be found for f (x, y). One can immediately give the brute force
solution by embedded looping.
1: function Solve(f, z)
2: A←ϕ
3: for x ∈ {0, 1, 2, ..., z} do
4: for y ∈ {0, 1, 2, ..., z} do
5: if f (x, y) = z then
6: A ← A ∪ {(x, y)}
7: return A
This definitely calculates f for (z + 1)2 times. It can be formalized as in (14.8).
solve(f, z) = {(x, y)|x ∈ {0, 1, ..., z}, y ∈ {0, 1, ..., z}, f (x, y) = z} (14.8)
Saddleback search
We haven’t utilize the fact that f (x, y) is strict increasing yet. Dijkstra pointed out in
[82], instead of searching from bottom-left corner, starting from the top-left leads to one
376 CHAPTER 14. SEARCHING
effective solution. As illustrated in figure 14.3, the search starts from (0, z), for every
point (p, q), we compare f (p, q) with z:
• If f (p, q) < z, since f is strict increasing, for all 0 ≤ y < q, we have f (p, y) < z. We
can drop all points in the vertical line section (in red color);
• If f (p, q) > z, then f (x, q) > z for all p < x ≤ z. We can drop all points in the
horizontal line section (in blue color);
• Otherwise if f (p, q) = z, we mark (p, q) as one solution, then both line sections can
be dropped.
This is a systematical way to scale down the solution domain rectangle. We keep
dropping a row, or a column, or both.
This method can be formalized as a function search(f, z, p, q), which searches so-
lutions for equation f (x, y) = z in rectangle with top-left corner (p, q), and bottom-
right corner (z, 0). We start the searching by initializing (p, q) = (0, z) as solve(f, z) =
search(f, z, 0, z)
ϕ : p>z∨q <0
search(f, z, p + 1, q) : f (p, q) < z
search(f, z, p, q) = (14.9)
search(f, z, p, q − 1) : f (p, q) > z
{(p, q)} ∪ search(f, z, p + 1, q − 1) : otherwise
The first clause is the edge case, there is no solution if (p, q) isn’t top-left to (z, 0).
The following example Haskell program implements this algorithm.
solve f z = search 0 z where
search p q | p > z | | q < 0 = []
| z' < z = search (p + 1) q
| z' > z = search p (q - 1)
| otherwise = (p, q) : search (p + 1) (q - 1)
where z' = f p q
Considering the calculation of f may be expensive, this program stores the result of
f (p, q) to variable z ′ . This algorithm can also be implemented in iterative manner, that
the boundaries of solution domain keeps being updated in a loop.
1: function Solve(f, z)
2: p ← 0, q ← z
14.2. SEQUENCE SEARCH 377
3: S←ϕ
4: while p ≤ z ∧ q ≥ 0 do
5: z ′ ← f (p, q)
6: if z ′ < z then
7: p←p+1
8: else if z ′ > z then
9: q ←q−1
10: else
11: S ← S ∪ {(p, q)}
12: p ← p + 1, q ← q − 1
13: return S
It’s intuitive to translate this imperative algorithm to real program, as the following
example Python code.
def solve(f, z):
(p, q) = (0, z)
res = []
while p ≤ z and q ≥ 0:
z1 = f(p, q)
if z1 < z:
p = p + 1
elif z1 > z:
q = q - 1
else:
[Link]((p, q))
(p, q) = (p + 1, q - 1)
return res
It is clear that in every iteration, At least one of p and q advances to the bottom-right
corner by one. Thus it takes at most 2(z + 1) steps to complete searching. This is the
worst case. There are three best cases. The first one happens that in every iteration,
both p and q advance by one, so that it needs only z + 1 steps; The second case keeps
advancing horizontally to right and ends when p exceeds z; The last case is similar, that
it keeps moving down vertically to the bottom until q becomes negative.
Figure 14.4 illustrates the best cases and the worst cases respectively. Figure 14.4 (a)
is the case that every point (x, z − x) in diagonal satisfies f (x, z − x) = z, it uses z + 1
steps to arrive at (z, 0); (b) is the case that every point (x, z) along the top horizontal
line gives the result f (x, z) < z, the algorithm takes z + 1 steps to finish; (c) is the case
that every point (0, x) along the left vertical line gives the result f (0, x) > z, thus the
algorithm takes z + 1 steps to finish; (d) is the worst case. If we project all the horizontal
sections along the search path to x axis, and all the vertical sections to y axis, it gives
the total steps of 2(z + 1).
Compare to the quadratic brute-force method (O(z 2 )), we improve to a linear algo-
rithm bound to O(z).
Bird imagined that the name ‘saddleback’ is because the 3D plot of f with the smallest
bottom-left and the largest top-right and two wings looks like a saddle as shown in figure
14.5
0 ≤ n ≤ z, along the x axis, which satisfies f (n, 0) ≤ z; And the solution domain shrinks
from (0, z) − (z, 0) to (0, m) − (n, 0) as shown in figure 14.6.
And the improved saddleback search shrinks to this new search domain solve(f, z) =
search(f, z, 0, m):
ϕ : p>n∨q <0
search(f, z, p + 1, q) : f (p, q) < z
search(f, z, p, q) = (14.13)
search(f, z, p, q − 1) : f (p, q) > z
{(p, q)} ∪ search(f, z, p + 1, q − 1) : otherwise
It’s almost as same as the basic saddleback version, except that it stops if p exceeds
n, but not z. In real implementation, the result of f (p, q) can be calculated once, and
stored in a variable as shown in the following Haskell example.
solve' f z = search 0 m where
search p q | p > n | | q < 0 = []
| z' < z = search (p + 1) q
| z' > z = search p (q - 1)
| otherwise = (p, q) : search (p + 1) (q - 1)
where z' = f p q
m = bsearch (f 0) z (0, z)
n = bsearch (λx→f x 0) z (0, z)
This improved saddleback search firstly performs binary search two rounds to find the
proper m, and n. Each round is bound to O(lg z) times of calculation for f ; After that,
it takes O(m + n) time in the worst case; and O(min(m, n)) time in the best case. The
overall performance is given in the following table.
times of evaluation f
worst case 2 log z + m + n
best case 2 log z + min(m, n)
For some function f (x, y) = ax + by , for positive integers a and b, m and n will be
relative small, that the performance is close to O(lg z).
This algorithm can also be realized in imperative approach. Firstly, the binary search
should be modified.
1: function Binary-Search(f, y, (l, u))
2: while l < u do
3: m ← b l+u 2 c
4: if f (m) ≤ y then
5: if y < f (m + 1) then
6: return m
7: l ←m+1
8: else
9: u←m
10: return l
Utilize this algorithm, the boundaries m and n can be found before performing the
saddleback search.
1: function Solve(f, z)
2: m ← Binary-Search(λy · f (0, y), z, (0, z))
3: n ← Binary-Search(λx · f (x, 0), z, (0, z))
4: p ← 0, q ← m
5: S←ϕ
6: while p ≤ n ∧ q ≥ 0 do
7: z ′ ← f (p, q)
8: if z ′ < z then
9: p←p+1
10: else if z ′ > z then
14.2. SEQUENCE SEARCH 381
11: q ←q−1
12: else
13: S ← S ∪ {(p, q)}
14: p ← p + 1, q ← q − 1
15: return S
The implementation is left as exercise to the reader.
Suppose we are searching in a rectangle from the upper-left corner (a, b) to the lower-
right corner (c, d). If the (p, q) isn’t the middle point, and f (p, q) 6= z. We can’t ensure
the area to be dropped is always 1/4. However, if f (p, q) = z, as f is strict increasing, we
382 CHAPTER 14. SEARCHING
are not only sure both the lower-left and the upper-right sub areas can be thrown, but
also all the other points in the column p and row q. The problem can be scaled down
fast, because only 1/2 area is left.
This indicates us, instead of jumping to the middle point to start searching. A more
efficient way is to find a point which evaluates to the target value. One straightforward
way to find such a point, is to perform binary search along the center horizontal line or
the center vertical line of the rectangle.
The performance of binary search along a line is logarithmic to the length of that line.
A good idea is to always pick the shorter center line as shown in figure 14.8. That if
the height of the rectangle is longer than the width, we perform binary search along the
horizontal center line; otherwise we choose the vertical center line.
However, what if we can’t find a point (p, q) in the center line, that satisfies f (p, q) = z?
Let’s take the center horizontal line for example. even in such case, we can still find a
point that f (p, q) < z < f (p + 1, q). The only difference is that we can’t drop the points
in column p and row q completely.
Combine this conditions, the binary search along the horizontally line is to find a p,
satisfies f (p, q) ≤ z < f (p + 1, q); While the vertical line search condition is f (p, q) ≤ z <
f (p, q + 1).
The modified binary search ensures that, if all points in the line segment give f (p, q) <
z, the upper bound will be found; and the lower bound will be found if they all greater
than z. We can drop the whole area on one side of the center line in such case.
Sum up all the ideas, we can develop the efficient improved saddleback search as the
following.
1. Perform binary search along the y axis and x axis to find the tight boundaries from
(0, m) to (n, 0);
2. Denote the candidate rectangle as (a, b) − (c, d), if the candidate rectangle is empty,
the solution is empty;
3. If the height of the rectangle is longer than the width, perform binary search along
the center horizontal line; otherwise, perform binary search along the center vertical
line; denote the search result as (p, q);
4. If f (p, q) = z, record (p, q) as a solution, and recursively search two sub rectangles
(a, b) − (p − 1, q + 1) and (p + 1, q − 1) − (c, d);
14.2. SEQUENCE SEARCH 383
5. Otherwise, f (p, q) 6= z, recursively search the same two sub rectangles plus a line
section. The line section is either (p, q + 1) − (p, b) as shown in figure 14.9 (a); or
(p + 1, q) − (c, q) as shown in figure 14.9 (b).
Figure 14.9: Recursively search the gray areas, the bold line should be included if f (p, q) 6=
z.
This algorithm can be formalized as the following. The equation (14.11), and (14.12)
are as same as before. A new search function should be defined.
Define Search(a,b),(c,d) as a function for searching rectangle with top-left corner (a, b),
and bottom-right corner (c, d).
ϕ : c<a∨d<b
search(a,b),(c,d) = csearch : c−a<b−d (14.14)
rsearch : otherwise
Function csearch performs binary search in the center horizontal line to find a point
(p, q) that f (p, q) ≤ z < f (p + 1, q). This is shown in figure 14.9 (a). There is a special
edge case, that all points in the lines evaluate to values greater than z. The general binary
search will return the lower bound as result, so that (p, q) = (a, b b+d
2 c). The whole upper
side includes the center line can be dropped as shown in figure 14.10 (a).
Figure 14.10: Edge cases when performing binary search in the center line.
384 CHAPTER 14. SEARCHING
search(p,q−1),(c,d) : z < f (p, q)
csearch = search(a,b),(p−1,q+1) ∪ {(p, q)} ∪ search(p+1,q−1),(c,d) : f (p, q) = z
search(a,b),(p,q+1) ∪ search(p+1,q−1),(c,d) : otherwise
(14.15)
Where
2 c)
q = b b+d
p = bsearch(λx · f (x, q), z, (a, c))
Function rsearch is quite similar except that it searches in the center horizontal line.
search(a,b),(p−1,q) : z < f (p, q)
rsearch = search(a,b),(p−1,q+1) ∪ {(p, q)} ∪ search(p+1,q−1),(c,d) : f (p, q) = z
search(a,b),(p−1,q+1) ∪ search(p+1,q),(c,d) : otherwise
(14.16)
Where
2 c)
p = b a+c
q = bsearch(λy · f (p, y), z, (d, b))
The following Haskell program implements this algorithm.
search f z (a, b) (c, d) | c < a | | b < d = []
| c - a < b - d = let q = (b + d) `div` 2 in
csearch (bsearch (λx → f x q) z (a, c), q)
| otherwise = let p = (a + c) `div` 2 in
rsearch (p, bsearch (f p) z (d, b))
where
csearch (p, q) | z < f p q = search f z (p, q - 1) (c, d)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p, q + 1) ++
search f z (p + 1, q - 1) (c, d)
rsearch (p, q) | z < f p q = search f z (a, b) (p - 1, q)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p - 1, q + 1) ++
search f z (p + 1, q) (c, d)
And the main program calls this function after performing binary search in X and Y
axes.
solve f z = search f z (0, m) (n, 0) where
m = bsearch (f 0) z (0, z)
n = bsearch (λx → f x 0) z (0, z)
Since we drop half areas in every recursion, it takes O(log(mn)) rounds of search.
However, in order to locate the point (p, q), which halves the problem, we must perform
binary search along the center line. which will call f about O(log(min(m, n))) times.
Denote the time of searching a m × n rectangle as T (m, n), the recursion relationship can
be represented as the following.
m n
T (m, n) = log(min(m, n)) + 2T ( , ) (14.17)
2 2
Suppose m > n, using telescope method, for m = 2i , and n = 2j . We have:
T (2i , 2j ) = j + 2T (2i−1 , 2j−1 )
∑
i−1
= 2k (j − k)
(14.18)
k=0
= O(2i (j − i))
= O(m log(n/m))
14.2. SEQUENCE SEARCH 385
Richard Bird proved that this is asymptotically optimal by a lower bound of searching
a given value in m × n rectangle [1].
The imperative algorithm is almost as same as the functional version. We skip it for
the sake of brevity.
Exercise 14.1
• Prove that the average case for the divide and conquer solution to k-selection prob-
lem is O(n). Please refer to previous chapter about quick sort.
• Implement the imperative k-selection problem with 2-way partition, and median-
of-three pivot selection.
• The tops(k, L) algorithm uses list concatenation likes A ∪ {l1 } ∪ tops(k − |A| − 1, B).
It is linear operation which is proportion to the length of the list to be concatenated.
Modify the algorithm so that the sub lists are concatenated by one pass.
• The author considered another divide and conquer solution for the k-selection prob-
lem. It finds the maximum of the first k elements and the minimum of the rest.
Denote them as x, and y. If x is smaller than y, it means that all the first k elements
are smaller than the rest, so that they are exactly the top k smallest; Otherwise,
There are some elements in the first k should be swapped.
1: procedure Tops(k, A)
2: l←1
3: u ← |A|
4: loop
5: i ← Max-At(A[l..k])
6: j ← Min-At(A[k + 1..u])
7: if A[i] < A[j] then
8: break
9: Exchange A[l] ↔ A[j]
10: Exchange A[k + 1] ↔ A[i]
11: l ← Partition(A, l, k)
12: u ← Partition(A, k + 1, u)
• Implement the binary search algorithm in both recursive and iterative manner, and
try to verify your version automatically. You can either generate randomized data,
test your program with the binary search invariant, or compare with the built-in
binary search tool in your standard library.
• Find the solution to calculate the median of two sorted arrays A and B. The time
should be bound to O(lg(|A| + |B|)).
• Realize the improved 2D search, by performing binary search along the shorter
center line, in your favorite imperative programming language.
• Someone considers that the 2D search can be designed as the following. When
search a rectangle, as the minimum value is at bottom-left, and the maximum at
to-right. If the target value is less than the minimum or greater than the maximum,
then there is no solution; otherwise, the rectangle is divided into 4 sub rectangles
at the center point, then perform recursively searching.
1: procedure Search(f, z, a, b, c, d) ▷ (a, b): bottom-left (c, d): top-right
2: if z ≤ f (a, b) ∨ f (c, d) ≥ z then
3: if z = f (a, b) then
4: record (a, b) as a solution
5: if z = f (c, d) then
6: record (c, d) as a solution
7: return
8: p ← b a+c
2 c
9: q←b 2 cb+d
10: Search(f, z, a, q, p, d)
11: Search(f, z, p, q, c, d)
12: Search(f, z, a, b, p, q)
13: Search(f, z, p, b, c, q)
sketch’[84].
14.2. SEQUENCE SEARCH 387
template<typename T>
T majority(const T∗ xs, int n, T fail) {
map<T, int> m;
int i, max = 0;
T r;
for (i = 0; i < n; ++i)
++m[xs[i]];
for (typename map<T, int>::iterator it = [Link](); it ̸= [Link](); ++it)
if (it→second > max) {
max = it→second;
r = it→first;
}
return max ∗ 2 > n ? r : fail;
}
This program first scan the votes, and accumulates the number of votes for each
individual with a map. After that, it traverse the map to find the one with the most of
votes. If the number is bigger than the half, the winner is found otherwise, it returns a
special value to indicate fail.
The following pseudo code describes this algorithm.
1: function Majority(A)
2: M ← empty map
3: for ∀a ∈ A do
4: Put(M , a, 1+ Get(M, a))
5: max ← 0, m ← N IL
6: for ∀(k, v) ∈ M do
7: if max < v then
8: max ← v, m ← k
9: if max > |A|50% then
10: return m
11: else
12: fail
For m individuals and n votes, this program firstly takes about O(n log m) time to
build the map if the map is implemented in self balanced tree (red-black tree for instance);
or about O(n) time if the map is hash table based. However, the hash table needs more
space. Next the program takes O(m) time to traverse the map, and find the majority
vote. The following table lists the time and space performance for different maps.
map time space
self-balanced tree O(n log m) O(m)
hashing O(n) O(m) at least
Boyer and Moore invented a cleaver algorithm in 1980, which can pick the majority
element with only one scan if there is. Their algorithm only needs O(1) space [83].
The idea is to record the first candidate as the winner so far, and mark him with 1
vote. During the scan process, if the winner being selected gets another vote, we just
increase the vote counter; otherwise, it means somebody vote against this candidate, so
the vote counter should be decreased by one. If the vote counter becomes zero, it means
this candidate is voted out; We select the next candidate as the new winner and repeat
the above scanning process.
Suppose there is a series of votes: A, B, C, B, B, C, A, B, A, B, B, D, B. Below table
illustrates the steps of this processing.
388 CHAPTER 14. SEARCHING
We also need to define a function, which can verify the result. The idea is that, if the
list of votes is empty, the final result is a failure; otherwise, we start the Boyer-Moore
algorithm to find a candidate c, then we scan the list again to count the total votes c
wins, and verify if this number is not less than the half.
f ail : L = ϕ
majority(L) = c : c = maj(l1 , 1, L′ ), |{x|x ∈ L, x = c}| > %50|L| (14.20)
f ail : otherwise
maj c n [] = c
maj c n (x:xs) | c == x = maj c (n+1) xs
| n == 0 = maj x 1 xs
| otherwise = maj c (n-1) xs
6 We actually uses the ANSI C style. The C++ template is only used to generalize the type of the
element
390 CHAPTER 14. SEARCHING
At any time when we scan to the i-th position, the max sum found so far is recorded
as A. At the same time, we also record the biggest sum end at i as B. Note that A
and B may not be the same, in fact, we always maintain B ≤ A. and when B becomes
greater than A by adding with the next element, we update A with this new value.
When B becomes negative, this happens when the next element is a negative number, we
reset it to 0. The following tables illustrated the steps when we scan the example vector
{3, −13, 19, −12, 1, 9, 18, −16, 15, −15}.
max sum max end at i list to be scan
0 0 {3, −13, 19, −12, 1, 9, 18, −16, 15, −15}
3 3 {−13, 19, −12, 1, 9, 18, −16, 15, −15}
3 0 {19, −12, 1, 9, 18, −16, 15, −15}
19 19 {−12, 1, 9, 18, −16, 15, −15}
19 7 {1, 9, 18, −16, 15, −15}
19 8 {9, 18, −16, 15, −15}
19 17 {18, −16, 15, −15}
35 35 {−16, 15, −15}
35 19 {15, −15}
35 34 {−15}
35 19 {}
This algorithm can be described as below.
14.2. SEQUENCE SEARCH 391
1: function Max-Sum(V )
2: A ← 0, B ← 0
3: for i ← 1 to |V | do
4: B ← Max(B + V [i], 0)
5: A ← Max(A, B)
It is trivial to implement this linear time algorithm, that we skip the details here.
This algorithm can also be defined in functional approach. Instead of mutating vari-
ables, we use accumulator to record A and B. In order to search the maximum sum of
list L, we call the below function with maxsum (0, 0, L).
{
A : L=ϕ
maxsum (A, B, L) = (14.21)
maxsum (A′ , B ′ , L′ ) : otherwise
Where
B ′ = max(l1 + B, 0)
A′ = max(A, B ′ )
Below Haskell example code implements this algorithm.
maxsum = msum 0 0 where
msum a _ [] = a
msum a b (x:xs) = let b' = max (x+b) 0
a' = max a b'
in msum a' b' xs
KMP
String matching is another important type of searching. Almost all the software editors
are equipped with tools to find string in the text. In chapters about Trie, Patricia, and
suffix tree, we have introduced some powerful data structures which can help to search
string. In this section, we introduce another two string matching algorithms all based on
information reusing.
Some programming environments provide built-in string search tools, however, most
of them are brute-force solution including ‘strstr’ function in ANSI C standard library,
‘find’ in C++ standard template library, ‘indexOf’ in Java Development Kit etc. Figure
14.12 illustrate how such character-by-character comparison process works.
Suppose we search a pattern P in text T , as shown in figure 14.12 (a), at offset s = 4,
the process examines every character in P and T to check if they are same. It successfully
matches the first 4 characters ‘anan’. However, the 5th character in the pattern string is
‘y’. It doesn’t match the corresponding character in the text, which is ‘t’.
At this stage, the brute-force solution terminates the attempt, increases s by one to 5,
and restart the comparison between ‘ananym’ and ‘nantho...’. Actually, we can increase
s not only by one. This is because we have already known that the first four characters
‘anan’ have been matched, and the failure happens at the 5th position. Observe the two
letters prefix ‘an’ of the pattern string is also a suffix of ‘anan’ that we have matched so
far. A more effective way is to shift s by two but not one, which is shown in figure 14.12
(b). By this means, we reused the information that 4 characters have been matched. This
helps us to skip invalid positions as many as possible.
Knuth, Morris and Pratt presented this idea in [85] and developed a novel string
matching algorithm. This algorithm is later called as ‘KMP’, which is consist of the three
authors’ initials.
For the sake of brevity, we denote the first k characters of text T as Tk . Which means
Tk is the k-character prefix of T .
392 CHAPTER 14. SEARCHING
a n y a n a n t h o u s a n a n y m f l o w e r T
s a n a n y m P
q
s a n a n y m P
q
The key point to shift s effectively is to find a function of q, where q is the number
of characters matched successfully. For instance, q is 4 in figure 14.12 (a), as the 5th
character doesn’t match.
Consider what situation we can shift s more than 1. As shown in figure 14.13, if we
can shift the pattern P ahead, there must exist k, so that the first k characters are as
same as the last k characters of Pq . In other words, the prefix Pk is suffix of Pq .
... T[i] T[i+1] T[i+2] ... ... ... ... T[i+q-1] ... T
s
P[1] P[2] ... P[j] P[j+1] ... P[q] ... P
It’s possible that there is no such a prefix that is the suffix at the same time. If we
treat empty string as both the prefix and the suffix of any others, there must be at least
one solution that k = 0. It’s also quite possible that there are multiple k satisfy. To avoid
missing any possible matching positions, we have to find the biggest k. We can define a
prefix function π(q) which tells us where we can fallback if the (q + 1)-th character does
not match [4].
Where ⊐ is read as ‘is suffix of’. For instance, A ⊐ B means A is suffix of B. This
function is used as the following. When we match pattern P against text T from offset
s, If it fails after matching q characters, we next look up π(q) to get a fallback q ′ , and
retry to compare P [q ′ ] with the previous unmatched character. Based on this idea, the
core algorithm of KMP can be described as the following.
1: function KMP(T, P )
14.2. SEQUENCE SEARCH 393
2: n ← |T |, m ← |P |
3: build prefix function π from P
4: q←0 ▷ How many characters have been matched so far.
5: for i ← 1 to n do
6: while q > 0 ∧ P [q + 1] 6= T [i] do
7: q ← π(q)
8: if P [q + 1] = T [i] then
9: q ←q+1
10: if q = m then
11: found one solution at i − m
12: q ← π(q) ▷ look for next solution
Although the definition of prefix function π(q) is given in equation (14.22), realizing
it blindly by finding the longest suffix isn’t effective. Actually we can use the idea of
information reusing again to build the prefix function.
The trivial edge case is that, the first character doesn’t match. In this case the longest
prefix, which is also the suffix is definitely empty, so π(1) = k = 0. We record the longest
prefix as Pk . In this edge case Pk = P0 is the empty string.
After that, when we scan at the q-th character in the pattern string P , we hold the
invariant that the prefix function values π(i) for i in {1, 2, ..., q − 1} have already been
recorded, and Pk is the longest prefix which is also the suffix of Pq−1 . As shown in figure
14.14, if P [q] = P [k + 1], A bigger k than before is found, we can increase the maximum
of k by one; otherwise, if they are not same, we can use π(k) to fallback to a shorter prefix
Pk′ where k ′ = π(k), and check if the next character after this new prefix is same as the
q-th character. We need repeat this step until either k becomes zero (which means only
empty string satisfies), or the q-th character matches.
q Pq k Pk
1 a 0 “”
2 an 0 “”
3 ana 1 a
4 anan 2 an
5 anany 0 “”
6 ananym 0 “”
Translating the KMP algorithm to Python gives the below example code.
def kmp_match(w, p):
n = len(w)
m = len(p)
fallback = fprefix(p)
k = 0 # how many elements have been matched so far.
res = []
for i in range(n):
while k > 0 and p[k] ̸= w[i]:
k = fallback[k] #fall back
if p[k] == w[i]:
k = k + 1
if k == m:
[Link](i+1-m)
k = fallback[k-1] # look for next
return res
def fprefix(p):
m = len(p)
t = [0]∗m # fallback table
k = 0
for i in range(2, m):
while k>0 and p[i-1] ̸= p[k]:
k = t[k-1] #fallback
if p[i-1] == p[k]:
k = k + 1
t[i] = k
return t
The KMP algorithm builds the prefix function for the pattern string as a kind of
pre-processing before the search. Because of this, it can reuse as much information of the
previous matching as possible.
The amortized performance of building the prefix function is O(m). This can be
proved by using potential method as in [4]. Using the similar method, it can be proved
that the matching algorithm itself is also linear. Thus the total performance is O(m + n)
at the expense of the O(m) space to record the prefix function table.
It seems that varies pattern string would affect the performance of KMP. Considering
the case that we are finding pattern string ‘aaa...a’ of length m in a string ‘aaa...a’ of
length n. All the characters are same, when the last character in the pattern is examined,
we can only fallback by 1, and this 1 character fallback repeats until it falls back to zero.
Even in this extreme case, KMP algorithm still holds its linear performance (why?).
Please try to consider more cases such as P = aaaa...b, T = aaaa...a and so on.
It is not easy to realize KMP matching algorithm in purely functional manner. The
imperative algorithm represented so far intensely uses array to record prefix function
values. Although it is possible to utilize sequence like structure in purely functional
settings, it is typically implemented with finger tree. Unlike native arrays, finger tree
14.2. SEQUENCE SEARCH 395
T[1] T[2] ... ... T[i] T[i+1] T[i+2] ... ... T[n-1] T[n] T
s
P[1] P[2] ... P[j] P[j+1] P[j+2] ... P[m] P
Figure 14.15: The first j characters in P are matched, next compare P [j +1] with T [i+1].
Denote the first i characters as Tp , which means the prefix of T , the rest characters
as Ts for suffix; Similarly, the first j characters as Pp , and the rest as Ps ; Denote the
first character of Ts as t, the first character of Ps as p. We have the following ‘cons’
relationship.
Ts = cons(t, Ts′ )
Ps = cons(p, Ps′ )
Tp′ = Tp ∪ {t}
Pp′ = Pp ∪ {p}
We’ve introduced a method in the chapter about purely functional queue, which can
solve this problem. By using a pair of front and rear list, we can turn the linear time
appending to constant time linking. The key point is to represent the prefix part in
reverse order.
←
−
T = Tp ∪ Ts = reverse(reverse(Tp )) ∪ Ts = reverse(Tp ) ∪ Ts
←− (14.23)
P = Pp ∪ Ps = reverse(reverse(Pp )) ∪ Ps = reverse(Pp ) ∪ Ps
←
− ←−
The idea is to using pair (Tp , Ts ) and (Pp , Ps ) instead. With this change, the if t = p,
we can update the prefix part fast in constant time.
←
− ←
−
Tp′ = cons(t, Tp )
←− ←− (14.24)
Pp′ = cons(p, Pp )
The KMP matching algorithm starts by initializing the success prefix parts to empty
strings as the following.
like Haskell.
396 CHAPTER 14. SEARCHING
Where π is the prefix function we explained before. The core part of KMP algorithm,
except for the prefix function building, can be defined as below.
←
−
{|Tp |} :
Ps = ϕ ∧ Ts = ϕ
Ps 6= ϕ ∧ Ts = ϕ
ϕ :
←
− ←− ←
−
←− ←
− {|Tp } ∪ kmp(π, π(Pp , Ps ), (Tp , Ts )) :
Ps = ϕ ∧ Ts 6= ϕ
kmp(π, (Pp , Ps ), (Tp , Ts )) = ←− ←
−
kmp(π, (Pp′ , Ps′ ), (Tp′ , Ts′ )) :
t=p
←− ←
− ←−
kmp(π, π(Pp , Ps ), (Tp′ , Ts′ )) :
t 6= p ∧ Pp = ϕ
←− ←
− ←−
t 6= p ∧ Pp 6= ϕ
kmp(π, π(Pp , Ps ), (Tp , Ts )) :
(14.26)
The first clause states that, if the scan successfully ends to both the pattern and text
strings, we get a solution, and the algorithm terminates. Note that we use the right
position in the text string as the matching point. It’s easy to use the left position by
subtracting with the length of the pattern string. For sake of brevity, we switch to right
position in functional solutions.
The second clause states that if the scan arrives at the end of text string, while there
are still rest of characters in the pattern string haven’t been matched, there is no solution.
And the algorithm terminates.
The third clause states that, if all the characters in the pattern string have been
successfully matched, while there are still characters in the text haven’t been examined,
we get a solution, and we fallback by calling prefix function π to go on searching other
solutions.
The fourth clause deals with the case, that the next character in pattern string and
text are same. In such case, the algorithm advances one character ahead, and recursively
performs searching.
If the the next characters are not same and this is the first character in the pattern
string, we just need advance to next character in the text, and try again. Otherwise if
this isn’t the first character in the pattern, we call prefix function π to fallback, and try
again.
The brute-force way to build the prefix function is just to follow the definition equation
(14.22).
←− ←−
π(Pp , Ps ) = (Pp′ , Ps′ ) (14.27)
where
Pp′ = longest({s|s ∈ pref ixes(Pp ), s ⊐ Pp })
Ps′ = P − Pp′
Every time when calculate the fallback position, the algorithm naively enumerates all
prefixes of Pp , checks if it is also the suffix of Pp , and then pick the longest one as result.
Note that we reuse the subtraction symbol here for list differ operation.
There is a tricky case which should be avoided. Because any string itself is both its
prefix and suffix. Say Pp ⊏ Pp and Pp ⊐ Pp . We shouldn’t enumerate Pp as a candidate
prefix. One solution of such prefix enumeration can be realized as the following.
{
{ϕ} : L = ϕ ∨ |L| = 1
pref ixes(L) =
cons(ϕ, map(λs · cons(l1 , s), pref ixes(L′ ))) : otherwise
(14.28)
Below Haskell example program implements this version of string matching algorithm.
kmpSearch1 ptn text = kmpSearch' next ([], ptn) ([], text)
14.2. SEQUENCE SEARCH 397
inits [] = [[]]
inits [_] = [[]]
inits (x:xs) = [] : (map (x:) $ inits xs)
This version does not only perform poorly, but it is also complex. We can simplify it
a bit. Observing the KMP matching is a scan process from left to the right of the text, it
can be represented with folding (refer to Appendix A for detail). Firstly, we can augment
each character with an index for folding like below.
Zipping the text string with infinity natural numbers gives list of pairs. For example,
text string ‘The quick brown fox jumps over the lazy dog’ turns into (T, 1), (h, 2), (e, 3),
... (o, 42), (g, 43).
The initial state for folding contains two parts, one is the pair of pattern (Pp , Ps ), with
prefix starts from empty, and the suffix is the whole pattern string (ϕ, P ). For illustration
←−
purpose only, we revert back to normal pairs but not (Pp , Ps ) notation. It can be easily
replaced with reversed form in the finalized version. This is left as exercise to the reader.
The other part is a list of positions, where the successful matching are found. It starts
from empty list. After the folding finishes, this list contains all solutions. What we need
is to extract this list from the final state. The core KMP search algorithm is simplified
like this.
The only ‘black box’ is the search function, which takes a state, and a pair of character
and index, and it returns a new state as result. Denote the first character in Ps as p and
the rest characters as Ps′ (Ps = cons(p, Ps′ )), we have the following definition.
((Pp ∪ p, Ps′ ), L ∪ {i}) : p = c ∧ Ps′ = ϕ
((Pp ∪ p, Ps′ ), L) : p = c ∧ Ps′ 6= ϕ
search(((Pp , Ps ), L), (c, i)) =
((Pp , Ps ), L) : Pp = ϕ
search((π(Pp , Ps ), L), (c, i)) : otherwise
(14.31)
If the first character in Ps matches the current character c during scan, we need further
check if all the characters in the pattern have been examined, if so, we successfully find a
solution, This position i in list L is recorded; Otherwise, we advance one character ahead
and go on. If p does not match c, we need fallback for further retry. However, there is
an edge case that we can’t fallback any more. Pp is empty in this case, and we need do
nothing but keep the current state.
The prefix-function π developed so far can also be improved a bit. Since we want
to find the longest prefix of Pp , which is also suffix of it, we can scan from right to left
398 CHAPTER 14. SEARCHING
instead. For any non empty list L, denote the first element as l1 , and all the rest except
for the first one as L′ , define a function init(L), which returns all the elements except for
the last one as below.
{
ϕ : |L| = 1
init(L) = (14.32)
cons(l1 , init(L′ )) : otherwise
Note that this function can not handle empty list. The idea of scan from right to left
for Pp is first check if init(Pp ) ⊐ Pp , if yes, then we are done; otherwise, we examine if
init(init(Pp )) is OK, and repeat this till the left most. Based on this idea, the prefix-
function can be modified as the following.
{
(Pp , Ps ) : Pp = ϕ
π(Pp , Ps ) = (14.33)
f allback(init(Pp ), cons(last(Pp ), Ps )) : otherwise
Where
{
(A, B) : A ⊐ Pp
f allback(A, B) = (14.34)
(init(A), cons(last(A), B)) : otherwise
Note that fallback always terminates because empty string is suffix of any string. The
last(L) function returns the last element of a list, it is also a linear time operation (refer
←−
to Appendix A for detail). However, it’s constant operation if we use Pp approach. This
improved prefix-function is bound to linear time. It is still quite slower than the imperative
algorithm which can look up prefix-function in constant O(1) time. The following Haskell
example program implements this minor improvement.
failure ([], ys) = ([], ys)
failure (xs, ys) = fallback (init xs) (last xs:ys) where
fallback as bs | as `isSuffixOf` xs = (as, bs)
| otherwise = fallback (init as) (last as:bs)
kmpSearch ws txt = snd $ foldl f (([], ws), []) (zip txt [1..]) where
f (p@(xs, (y:ys)), ns) (x, n) | x == y = if ys==[] then ((xs++[y], ys), ns++[n])
else ((xs++[y], ys), ns)
| xs == [] = (p, ns)
| otherwise = f (failure p, ns) (x, n)
f (p, ns) e = f (failure p, ns) e
The bottleneck is that we can not use native array to record prefix functions in purely
functional settings. In fact the prefix function can be understood as a state transform
function. It transfer from one state to the other according to the matching is success or
fail. We can abstract such state changing as a tree. In environment supporting algebraic
data type, Haskell for example, such state tree can be defined like below.
data State a = E | S a (State a) (State a)
A state is either empty, or contains three parts: the current state, the new state if
match fails, and the new state if match succeeds. Such definition is quite similar to the
binary tree. We can call it ‘left-fail, right-success’ tree. The state we are using here is
(Pp , Ps ).
Similar as imperative KMP algorithm, which builds the prefix function from the pat-
tern string, the state transforming tree can also be built from the pattern. The idea is
to build the tree from the very beginning state (ϕ, P ), with both its children empty. We
replace the left child with a new state by calling π function defined above, and replace
the right child by advancing one character ahead. There is an edge case, that when the
14.2. SEQUENCE SEARCH 399
state transfers to (P, ϕ), we can not advance any more in success case, such node only
contains child for failure case. The build function is defined as the following.
{
build(π(Pp , Ps ), ϕ, ϕ) : Ps = ϕ
build((Pp , Ps ), ϕ, ϕ) = (14.35)
build((Pp , Ps ), L, R) : otherwise
Where
L = build(π(Pp , Ps ), ϕ, ϕ)
R = build((Ps ∪ {p}, Ps′ ), ϕ, ϕ))
The meaning of p and Ps′ are as same as before, that p is the first character in Ps , and
Ps′is the rest characters. The most interesting point is that the build function will never
stop. It endless build a infinite tree. In strict programming environment, calling this
function will freeze. However, in environments support lazy evaluation, only the nodes
have to be used will be created. For example, both Haskell and Scheme/Lisp are capable
to construct such infinite state tree. In imperative settings, it is typically realized by
using pointers which links to ancestor of a node.
(’’, ananym)
fail match
fail match
fail match
fail match
fail match
fail
Figure 14.16 illustrates such an infinite state tree for pattern string ‘ananym’. Note
that the right most edge represents the case that the matching continuously succeed for
all characters. After that, since we can’t match any more, so the right sub-tree is empty.
Base on this fact, we can define a auxiliary function to test if a state indicates the whole
pattern is successfully matched.
{
T rue : Ps = ϕ
match((Pp , Ps ), L, R) = (14.36)
F alse : otherwise
With the help of state transform tree, we can realize KMP algorithm in an automaton
manner.
Where the tree T r = build((ϕ, P ), ϕ, ϕ) is the infinite state transform tree. Function
search utilizes this tree to transform the state according to match or fail. Denote the
400 CHAPTER 14. SEARCHING
first character in Ps as p, the rest characters as Ps′ , and the matched positions found so
far as A.
(R, A ∪ {i}) : p = c ∧ match(R)
(R, A) : p = c ∧ ¬match(R)
search((((Pp , Ps ), L, R), A), (c, i)) =
((((Pp , Ps ), L, R), A) : Pp = ϕ
search((L, A), (c, i)) : otherwise
(14.38)
The following Haskell example program implements this algorithm.
data State a = E | S a (State a) (State a) −− state, ok-state, fail-state
deriving (Eq, Show)
The bottle-neck is that the state tree building function calls π to fallback. While
current definition of π isn’t effective enough, because it enumerates all candidates from
right to the left every time.
Since the state tree is infinite, we can adopt some common treatment for infinite
structures. One good example is the Fibonacci series. The first two Fibonacci numbers
are defined as 0 and 1; the rest Fibonacci numbers can be obtained by adding the previous
two numbers.
F0 = 0
F1 = 1 (14.39)
Fn = Fn−1 + Fn−2
Thus the Fibonacci numbers can be list one by one as the following
F0 =0
F1 =1
F2 = F1 + F0 (14.40)
F3 = F2 + F1
...
We can collect all numbers in both sides, and define F = {0, 1, F1 , F2 , ...}, Thus we
have the following equation.
F = {0, 1, F1 + F0 , F2 + F1 , ...}
= {0, 1} ∪ {x + y|x ∈ {F0 , F1 , F2 , ...}, y ∈ {F1 , F2 , F3 , ...}} (14.41)
= {0, 1} ∪ {x + y|x ∈ F, y ∈ F ′ }
Where F ′ = tail(F ) is all the Fibonacci numbers except for the first one. In environ-
ments support lazy evaluation, like Haskell for instance, this definition can be expressed
like below.
14.2. SEQUENCE SEARCH 401
The recursive definition for infinite Fibonacci series indicates an idea which can be
used to get rid of the fallback function π. Denote the state transfer tree as T , we can
define the transfer function when matching a character on this tree as the following.
root : T = ϕ
trans(T, c) = R : T = ((Pp , Ps ), L, R), c = p (14.42)
trans(L, c) : otherwise
If we match a character against empty node, we transfer to the root of the tree. We’ll
define the root later soon. Otherwise, we compare if the character c is as same as the first
character p in Ps . If they match, then we transfer to the right sub tree for this success
case; otherwise, we transfer to the left sub tree for fail case.
With transfer function defined, we can modify the previous tree building function
accordingly. This is quite similar to the previous Fibonacci series definition.
The right hand of this equation contains three parts. The first one is the state that we
are matching (Pp , Ps ); If the match fails, Since T itself can handle any fail case, we use
it directly as the left sub tree; otherwise we recursive build the right sub tree for success
case by advancing one character ahead, and calling transfer function we defined above.
However, there is an edge case which has to be handled specially, that if Ps is empty,
which indicates a successful match. As defined above, there isn’t right sub tree any more.
Combining these cases gives the final building function.
{
((Pp , Ps ), T, ϕ) : Ps = ϕ
build(T, (Pp , Ps )) =
((Pp , Ps ), T, build(trans(T, p), (Pp ∪ {p}, Ps′ ))) : otherwise
(14.43)
The last brick is to define the root of the infinite state transfer tree, which initializes
the building.
And the new KMP matching algorithm is modified with this root.
Figure 14.17 shows the first 4 steps when search ‘anaym’ in text ’anal’. Since the first
3 steps all succeed, so the left sub trees of these 3 states are not actually constructed.
They are marked as ‘?’. In the fourth step, the match fails, thus the right sub tree needn’t
be built. On the other hand, we must construct the left sub tree, which is on top of the
result of trans(right(right(right(T ))), n), where function right(T ) returns the right sub
402 CHAPTER 14. SEARCHING
(’’, ananym)
fail match
? (a, nanym)
fail match
? (an, anym)
fail match
? (ana, nym)
fail match
(a, nanym) ?
Figure 14.17: On demand construct the state transform tree when searching ‘ananym’ in
text ‘anal’.
tree of T . This can be further expanded according to the definition of building and state
transforming functions till we get the concrete state ((a, nanym), L, R). The detailed
deduce process is left as exercise to the reader.
This algorithm depends on the lazy evaluation critically. All the states to be trans-
ferred are built on demand. So that the building process is amortized O(m), and the total
performance is amortized O(n + m). Readers can refer to [1] for detailed proof of it.
It’s worth of comparing the final purely functional and the imperative algorithms.
In many cases, we have expressive functional realization, however, for KMP matching
algorithm, the imperative approach is much simpler and more intuitive. This is because
we have to mimic the raw array by a infinite state transfer tree.
Boyer-Moore
Boyer-Moore string matching algorithm is another effective solution invited in 1977 [86].
The idea of Boyer-Moore algorithm comes from the following observation.
a n y a n a n t h o u s a n a n y m f l o w e r T
s
a n a n y m P
Figure 14.18: Since character ‘h’ doesn’t appear in the pattern, we wouldn’t find a match
if we slide the pattern down less than the length of the pattern.
14.2. SEQUENCE SEARCH 403
This leads to the bad-character rule. We can do a pre-processing for the pattern. If
the character set of the text is already known, we can find all characters which don’t
appear in the pattern string. During the later scan process, as long as we find such a bad
character, we can immediately slide the pattern down by its length. The question is what
if the unmatched character does appear in the pattern? While, in order not to miss any
potential matches, we have to slide down the pattern to check again. This is shown as in
the figure 14.19
i s s i m p l e ... T
e x a m p l e P
|
(a)
The
last
char-
ac-
ter
in
the
pat-
tern
‘e’
doesn’t
match
‘p’.
How-
ever,
‘p’
ap-
pears
in
the
pat-
tern.
i s s i m p l e ... T
e x a m p l e P
Figure 14.19: Slide the pattern if the unmatched character appears in the pattern.
It’s quite possible that the unmatched character appears in the pattern more than
one position. Denote the length of the pattern as |P |, the character appears in positions
p1 , p2 , ..., pi . In such case, we take the right most one to avoid missing any matches.
s = |P | − pi (14.46)
Note that the shifting length is 0 for the last position in the pattern according to the
above equation. Thus we can skip it in realization. Another important point is that since
404 CHAPTER 14. SEARCHING
the shifting length is calculated against the position aligned with the last character in the
pattern string, (we deduce it from |P |), no matter where the mismatching happens when
we scan from right to the left, we slide down the pattern string by looking up the bad
character table with the one in the text aligned with the last character of the pattern.
This is shown in figure 14.20.
i s s i m p l e ... T i s s i m p l e ... T
e x a m p l e P e x a m p l e P
(a) (b)
Figure 14.20: Even the mismatching happens in the middle, between char ‘i’ and ‘a’, we
look up the shifting value with character ‘e’, which is 6 (calculated from the first ‘e’, the
second ‘e’ is skipped to avoid zero shifting).
There is a good result in practice, that only using the bad-character rule leads to a
simple and fast string matching algorithm, called Boyer-Moore-Horspool algorithm [87].
1: procedure Boyer-Moore-Horspool(T, P )
2: for ∀c ∈ Σ do
3: π[c] ← |P |
4: for i ← 1 to |P | − 1 do ▷ Skip the last position
5: π[P [i]] ← |P | − i
6: s←0
7: while s + |P | ≤ |T | do
8: i ← |P |
9: while i ≥ 1 ∧ P [i] = T [s + i] do ▷ scan from right
10: i←i−1
11: if i < 1 then
12: found one solution at s
13: s←s+1 ▷ go on finding the next
14: else
15: s ← s + π[T [s + |P |]]
The character set is denoted as Σ, we first initialize all the values of sliding table π
as the length of the pattern string |P |. After that we process the pattern from left to
right, update the sliding value. If a character appears multiple times in the pattern, the
latter value, which is on the right hand, will overwrite the previous value. We start the
matching scan process by aligning the pattern and the text string from the very left.
However, for every alignment s, we scan from the right to the left until either there is
unmatched character or all the characters in the pattern have been examined. The latter
case indicates that we’ve found a match; while for the former case, we look up π to slide
the pattern down to the right.
The following example Python code implements this algorithm accordingly.
def bmh_match(w, p):
n = len(w)
m = len(p)
tab = [m for _ in range(256)] # table to hold the bad character rule.
for i in range(m-1):
tab[ord(p[i])] = m - 1 - i
res = []
offset = 0
14.2. SEQUENCE SEARCH 405
while offset + m ≤ n:
i = m - 1
while i ≥ 0 and p[i] == w[offset+i]:
i = i - 1
if i < 0:
[Link](offset)
offset = offset + 1
else:
offset = offset + tab[ord(w[offset + m - 1])]
return res
The algorithm firstly takes about O(|Σ| + |P |) time to build the sliding table. If the
character set size is small, the performance is dominated by the pattern and the text.
There is definitely the worst case that all the characters in the pattern and text are same,
e.g. searching ‘aa...a’ (m of ‘a’, denoted as am ) in text ‘aa......a’ (n of ‘a’, denoted as an ).
The performance in the worst case is O(mn). This algorithm performs well if the pattern
is long, and there are constant number of matching. The result is bound to linear time.
This is as same as the best case of full Boyer-Moore algorithm which will be explained
next.
b b b a b a b b a b a b ... T
a b b a b a b P
(a)
b b b a b a b b a b a b ... T
a b b a b a b P
(b)
Figure 14.21: According to the bad-character rule, the pattern is slided by 2, so that the
next ‘b’ is aligned.
Actually, we can do better than this. Observing that before the unmatched point, we
have already successfully matched 6 characters ‘bbabab’ from right to the left. Since ‘ab’,
which is the prefix of the pattern is also the suffix of what we matched so far, we can
directly slide the pattern to align this suffix as shown in figure 14.22.
This is quite similar to the pre-processing of KMP algorithm, However, we can’t always
skip so many characters. Consider the following example as shown in figure 14.23. We
have matched characters ‘bab’ when the unmatch happens. Although the prefix ‘ab’ of
the pattern is also the suffix of ‘bab’, we can’t slide the pattern so far. This is because
‘bab’ appears somewhere else, which starts from the 3rd character of the pattern. In
order not to miss any potential matching, we can only slide the pattern by two.
The above situation forms the two cases of the good-suffix rule, as shown in figure
14.24.
406 CHAPTER 14. SEARCHING
b b b a b a b b a b a b ... T
a b b a b a b P
Figure 14.22: As the prefix ‘ab’ is also the suffix of what we’ve matched, we can slide
down the pattern to a position so that ‘ab’ are aligned.
b a a b b a b a b ... T
b a a b b a b a b ... T
a b b a b a b P a b b a b a b P
(a) (b)
Figure 14.23: We’ve matched ‘bab’, which appears somewhere else in the pattern (from
the 3rd to the 5th character). We can only slide down the pattern by 2 to avoid missing
any potential matching.
Both cases in good suffix rule handle the situation that there are multiple characters
have been matched from right. We can slide the pattern to the right if any of the the
following happens.
• Case 1 states that if a part of the matching suffix occurs as a prefix of the pattern,
and the matching suffix doesn’t appear in any other places in the pattern, we can
slide the pattern to the right to make this prefix aligned;
• Case 2 states that if the matching suffix occurs some where else in the pattern, we
can slide the pattern to make the right most occurrence aligned.
Note that in the scan process, we should apply case 2 first whenever it is possible, and
then examine case 1 if the whole matched suffix does not appears in the pattern. Observe
that both cases of the good-suffix rule only depend on the pattern string, a table can be
built by pre-process the pattern for further looking up.
For the sake of brevity, we denote the suffix string from the i-th character of P as Pi .
That Pi is the sub-string P [i]P [i + 1]...P [m].
For case 1, we can check every suffix of P , which includes Pm , Pm−1 , Pm−2 , ..., P2 to
examine if it is the prefix of P . This can be achieved by a round of scan from right to the
left.
For case 2, we can check every prefix of P includes P1 , P2 , ..., Pm−1 to examine if the
longest suffix is also a suffix of P . This can be achieved by another round of scan from
left to the right.
1: function Good-Suffix(P )
2: m ← |P |
3: πs ← {0, 0, ..., 0} ▷ Initialize the table of length m
4: l←0 ▷ The last suffix which is also prefix of P
5: for i ← m − 1 down-to 1 do ▷ First loop for case 1
6: if Pi ⊏ P then ▷ ⊏ means ‘is prefix of’
7: l←i
8: πs [i] ← l
14.2. SEQUENCE SEARCH 407
(a) Case 1, Only a part of the matching suffix occurs as a prefix of the pattern.
(b) Case 2, The matching suffix occurs some where else in the pattern.
Figure 14.24: The light gray section in the text represents the characters have been
matched; The dark gray parts indicate the same content in the pattern.
408 CHAPTER 14. SEARCHING
It’s quite possible that both the bad-character rule and the good-suffix rule can be
applied when the unmatch happens. The Boyer-Moore algorithm compares and picks the
14.2. SEQUENCE SEARCH 409
bigger shift so that it can find the solution as quick as possible. The bad-character rule
table can be explicitly built as below
1: function Bad-Character(P )
2: for ∀c ∈ Σ do
3: πb [c] ← |P |
4: for i ← 1 to |P | − 1 do
5: πb [P [i]] ← |P | − i
6: return πb
The following Python program implements the bad-character rule accordingly.
def bad_char(p):
m = len(p)
tab = [m for _ in range(256)]
for i in range(m-1):
tab[ord(p[i])] = m - 1 - i
return tab
The final Boyer-Moore algorithm firstly builds the two rules from the pattern, then
aligns the pattern to the beginning of the text and scans from right to the left for every
alignment. If any unmatch happens, it tries both rules, and slides the pattern with the
bigger shift.
1: function Boyer-Moore(T, P )
2: n ← |T |, m ← |P |
3: πb ← Bad-Character(P )
4: πs ← Good-Suffix(P )
5: s←0
6: while s + m ≤ n do
7: i←m
8: while i ≥ 1 ∧ P [i] = T [s + i] do
9: i←i−1
10: if i < 1 then
11: found one solution at s
12: s←s+1 ▷ go on finding the next
13: else
14: s ← s + max(πb [T [s + m]], πs [i])
Here is the example implementation of Boyer-Moore algorithm in Python.
def bm_match(w, p):
n = len(w)
m = len(p)
tab1 = bad_char(p)
tab2 = good_suffix(p)
res = []
offset = 0
while offset + m ≤ n:
i = m - 1
while i ≥ 0 and p[i] == w[offset + i]:
i = i - 1
if i < 0:
[Link](offset)
offset = offset + 1
else:
offset = offset + max(tab1[ord(w[offset + m - 1])], tab2[i])
return res
this fact in 1977 [88]. However, when the pattern appears in the text, as we shown above,
Boyer-Moore performs O(nm) in the worst case.
Richard Birds shows a purely functional realization of Boyer-Moore algorithm in chap-
ter 16 in [1]. We skipped it in this book.
Exercise 14.2
• Implement the purely functional KMP algorithm by using reversed Pp to avoid the
linear time appending operation.
• Deduce the state of the tree lef t(right(right(right(T )))) when searching ‘ananym’
in text ‘anal’.
Maze
Maze is a classic and popular puzzle. Maze is amazing to both kids and adults. Figure
14.26 shows an example maze. There are also real maze gardens can be found in parks
for fun. In the late 1990s, maze-solving games were quite often hold in robot mouse
competition all over the world.
There are multiple methods to solve maze puzzle. We’ll introduce an effective, yet not
the best one in this section. There are some well known sayings about how to find the
way out in maze, while not all of them are true.
For example, one method states that, wherever you have multiple ways, always turn
right. This doesn’t work as shown in figure 14.27. The obvious solution is first to go
along the top horizontal line, then turn right, and keep going ahead at the ’T’ section.
However, if we always turn right, we’ll endless loop around the inner big block.
This example tells us that the decision when there are multiple choices matters the
solution. Like the fairy tale we read in our childhood, we can take some bread crumbs
in a maze. When there are multiple ways, we can simply select one, left a piece of bread
412 CHAPTER 14. SEARCHING
crumbs to mark this attempt. If we enter a died end, we go back to the last place where
we’ve made a decision by back-tracking the bread crumbs. Then we can alter to another
way.
At any time, if we find there have been already bread crumbs left, it means we have
entered a loop, we must go back and try different ways. Repeat these try-and-check
steps, we can either find the way out, or give the ‘no solution’ fact. In the later case, we
back-track to the start point.
One easy way to describe a maze, is by a m × n matrix, each element is either 0 or 1,
which indicates if there is a way at this cell. The maze illustrated in figure 14.27 can be
defined as the following matrix.
0 0 0 0 0 0
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
0 0 0 0 0 0
1 1 1 1 1 0
Given a start point s = (i, j), and a goal e = (p, q), we need find all solutions, that
are the paths from s to e.
There is an obviously recursive exhaustive search method. That in order to find all
paths from s to e, we can check all connected points to s, for every such point k, we
recursively find all paths from k to e. This method can be illustrated as the following.
• Trivial case, if the start point s is as same as the target point e, we are done;
• Otherwise, for every connected point k to s, recursively find the paths from k to e;
If e can be reached via k, put section s-k in front of each path between k and e.
14.3. SOLUTION SEARCHING 413
However, we have to left ’bread crumbs’ to avoid repeatedly trying the same attempts.
This is because otherwise in the recursive case, we start from s, find a connected point
k, then we further try to find paths from k to e. Since s is connected to k as well, so in
the next recursion, we’ll try to find paths from s to e again. It turns to be the very same
origin problem, and we are trapped in infinite recursions.
Our solution is to initialize an empty list, use it to record all the points we’ve visited
so far. For every connected point, we look up the list to examine if it has already been
visited. We skip all the visited candidates and only try those new ones. The corresponding
algorithm can be defined like this.
Where m is the matrix which defines a maze, s is the start point, and e is the end
point. Function solve is defined in the context of solveM aze, so that the maze and the
end point can be accessed. It can be realized recursively like what we described above8 .
{{s} ∪ p|p ∈ P } : s = e
solve(s, P ) = concat({ solve(s′ , {{s} ∪ p|p ∈ P })| (14.48)
: otherwise
s′ ∈ adj(s), ¬visited(s′ )})
Note that P also serves as an accumulator. Every connected point is recorded in all
the possible paths to the current position. But they are stored in reversed order, that is
the newly visited point is put to the head of all the lists, and the starting point is the last
one. This is because the appending operation is linear (O(n), where n is the number of
elements stored in a list), while linking to the head is just constant time. We can output
the result in correct order by reversing all possible solutions in equation (14.47)9 :
We need define functions adj(p) and visited(p), which finds all the connected points
to p, and tests if point p has been visited respectively. Two points are connected if and
only if they are next cells horizontally or vertically in the maze matrix, and both have
zero value.
adj((x, y)) = {(x′ , y ′ )| (x′ , y ′ ) ∈ {(x − 1, y), (x + 1, y), (x, y − 1), (x, y + 1)},
(14.50)
1 ≤ x′ ≤ M, 1 ≤ y ′ ≤ N, mx′ y′ = 0}
8 Function concat can flatten a list of lists. For example. concat({{a, b, c}, {x, y, z}}) = {a, b, c, x, y, z}.
For a maze defined as matrix like below example, all the solutions can be given by
this program.
mz = [[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0]]
i j k
[i, p, ... , s]
[p, ... , s] [j, p, ..., s]
[a, s]
[s] ... [q, ..., s] [k, p, ..., s]
[b, s]
... [q, ..., s]
...
Figure 14.28: The stack is initialized with a singleton list of the starting point s. s is
connected with point a and b. Paths {a, s} and {b, s} are pushed back. In some step,
the path ended with point p is popped. p is connected with points i, j, and k. These 3
points are expanded as different options and pushed back to the stack. The candidate
path ended with q won’t be examined unless all the options above fail.
The stack can be realized with a list. The latest option is picked from the head, and
the new candidates are also added to the head. The maze puzzle can be solved by using
such a list of paths:
As we are searching the first, but not all the solutions, map isn’t used here. When
the stack is empty, it means that we’ve tried all the options and failed to find a way out.
14.3. SOLUTION SEARCHING 415
There is no solution; otherwise, the top option is popped, expanded with all the adjacent
points which haven’t been visited before, and pushed back to the stack. Denote the stack
as S, if it isn’t empty, the top element is s1 , and the new stack after the top being popped
as S ′ . s1 is a list of points represents path P . Denote the first point in this path as p1 ,
and the rest as P ′ . The solution can be formalized as the following.
ϕ : S=ϕ
s : s1 = e
solve′ (S) = ′ ′
1
solve (S ) : C = {c|c ∈ adj(p1 ), c 6∈ P ′ } = ϕ
′
solve ({{p} ∪ P |p ∈ C} ∪ S) : otherwise, C 6= ϕ
(14.53)
Where the adj function is defined above. This updated maze solution can be imple-
mented with the below example Haskell program 10 .
dfsSolve m from to = reverse $ solve [[from]] where
solve [] = []
solve (c@(p:path):cs)
| p == to = c −− stop at the first solution
| otherwise = let os = filter (`notElem` path) (adjacent p) in
if os == []
then solve cs
else solve ((map (:c) os) ++ cs)
It’s quite easy to modify this algorithm to find all solutions. When we find a path in
the second clause, instead of returning it immediately, we record it and go on checking
the rest memorized options in the stack till until the stack becomes empty. We left it as
an exercise to the reader.
The same idea can also be realized imperatively. We maintain a stack to store all
possible paths from the starting point. In each iteration, the top option path is popped,
if the farthest position is the end point, a solution is found; otherwise, all the adjacent,
not visited yet points are appended as new paths and pushed back to the stack. This is
repeated till all the candidate paths in the stacks are checked.
We use the same notation to represent the stack S. But the paths will be stored as
arrays instead of list in imperative settings as the former is more effective. Because of this
the starting point is the first element in the path array, while the farthest reached place
is the right most element. We use pn to represent Last(P ) for path P . The imperative
algorithm can be given as below.
1: function Solve-Maze(m, s, e)
2: S←ϕ
3: Push(S, {s})
4: L←ϕ ▷ the result list
5: while S 6= ϕ do
6: P ← Pop(S)
7: if e = pn then
8: Add(L, P )
9: else
10: for ∀p ∈ Adjacent(m, pn ) do
11: if p ∈
/ P then
12: Push(S, P ∪ {p})
13: return L
The following example Python program implements this maze solving algorithm.
def solve(m, src, dst):
stack = [[src]]
s = []
while stack ̸= []:
path = [Link]()
if path[-1] == dst:
[Link](path)
else:
for p in adjacent(m, path[-1]):
if not p in path:
[Link](path + [p])
return s
And the same maze example given above can be solved by this program like the
following.
mz = [[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0]]
It seems that in the worst case, there are 4 options (up, down, left, and right) at each
step, each option is pushed to the stack and eventually examined during backtracking.
Thus the complexity is bound to O(4n ). The actual time won’t be so large because we
filtered out the places which have been visited before. In the worst case, all the reachable
points are visited exactly once. So the time is bound to O(n), where n is the number
of points connected in total. As a stack is used to store candidate solutions, the space
complexity is O(n2 ).
queens can be in the same row, and each queen must be put on one column between 1
to 8. Thus we can represent the arrangement as a permutation of {1, 2, 3, 4, 5, 6, 7, 8}.
For instance, the arrangement {6, 2, 7, 1, 3, 5, 8, 4} means, we put the first queen at row 1,
column 6, the second queen at row 2 column 2, ..., and the last queen at row 8, column
4. By this means, we need only examine 8! = 40320 possibilities.
14.3. SOLUTION SEARCHING 417
We can find better solutions than this. Similar to the maze puzzle, we put queens one
by one from the first row. For the first queen, there are 8 options, that we can put it at
one of the eight columns. Then for the next queen, we again examine the 8 candidate
columns. Some of them are not valid because those positions will be attacked by the first
queen. We repeat this process, for the i-th queen, we examine the 8 columns in row i,
find which columns are safe. If none column is valid, it means all the columns in this row
will be attacked by some queen we’ve previously arranged, we have to backtrack as what
we did in the maze puzzle. When all the 8 queens are successfully put to the board, we
find a solution. In order to find all the possible solutions, we need record it and go on to
examine other candidate columns and perform back tracking if necessary. This process
terminates when all the columns in the first row have been examined. The below equation
starts the search.
solve({ϕ}, ϕ) (14.54)
In order to manage the candidate attempts, a stack S is used as same as in the maze
puzzle. The stack is initialized with one empty element. And a list L is used to record all
possible solutions. Denote the top element in the stack as s1 . It’s actually an intermediate
state of assignment, which is a partial permutation of 1 to 8. after pops s1 , the stack
becomes S ′ . The solve function can be defined as the following.
L : S=ϕ
′
solve(S ,
{s1 } ∪ L) : |s1 | = 8
solve(S, L) = {i} ∪ s1 | i ∈ [1, 8], (14.55)
′
solve( i ∈
/ s1 , ∪ S , L) : otherwise
saf e(i, s1 )
If the stack is empty, all the possible candidates have been examined, it’s not possible
to backtrack any more. L has been accumulated all found solutions and returned as the
result; Otherwise, if the length of the top element in the stack is 8, a valid solution is
found. We add it to L, and go on finding other solutions; If the length is less than 8,
we need try to put the next queen. Among all the columns from 1 to 8, we pick those
not already occupied by previous queens (through the i ∈ / s1 clause), and must not be
attacked in diagonal direction (through the saf e predication). The valid assignments will
be pushed to the stack for the further searching.
Function saf e(x, C) detects if the assignment of a queen in position x will be attacked
by other queens in C in diagonal direction. There are 2 possible cases, 45◦ and 135◦
directions. Since the row of this new queen is y = 1 + |C|, where |C| is the length of C,
the saf e function can be defined as the following.
saf e(x, C) = ∀(c, r) ∈ zip(reverse(C), {1, 2, ...}), |x − c| 6= |y − r| (14.56)
418 CHAPTER 14. SEARCHING
Where zip takes two lists, and pairs every elements in them to a new list. Thus If
C = {ci−1 , ci−2 , ..., c2 , c1 } represents the column of the first i−1 queens has been assigned,
the above function will check list of pairs {(c1 , 1), (c2 , 2), ..., (ci−1 , i − 1)} with position
(x, y) forms any diagonal lines.
Translating this algorithm into Haskell gives the below example program.
solve = dfsSolve [[]] [] where
dfsSolve [] s = s
dfsSolve (c:cs) s
| length c == 8 = dfsSolve cs (c:s)
| otherwise = dfsSolve ([(x:c) | x ← [1..8] \\ c,
not $ attack x c] ++ cs) s
attack x cs = let y = 1 + length cs in
any (λ(c, r) → abs(x - c) == abs(y - r)) $
zip (reverse cs) [1..]
Observing that the algorithm is tail recursive, it’s easy to transform it into imperative
realization. Instead of using list, we use array to represent queens assignment. Denote the
stack as S, and the possible solutions as A. The imperative algorithm can be described
as the following.
1: function Solve-Queens
2: S ← {ϕ}
3: L←ϕ ▷ The result list
4: while S 6= ϕ do
5: A ← Pop(S) ▷ A is an intermediate assignment
6: if |A| = 8 then
7: Add(L, A)
8: else
9: for i ← 1 to 8 do
10: if Valid(i, A) then
11: Push(S, A ∪ {i})
12: return L
The stack is initialized with the empty assignment. The main process repeatedly pops
the top candidate from the stack. If there are still queens left, the algorithm examines
possible columns in the next row from 1 to 8. If a column is safe, that it won’t be attacked
by any previous queens, this column will be appended to the assignment, and pushed back
to the stack. Different from the functional approach, since array, but not list, is used, we
needn’t reverse the solution assignment any more.
Function Valid checks if column x is safe with previous queens put in A. It filters out
the columns have already been occupied, and calculates if any diagonal lines are formed
with existing queens.
1: function Valid(x, A)
2: y ← 1 + |A|
3: for i ← 1 to |A| do
4: if x = A[i] ∨ |y − i| = |x − A[i]| then
5: return False
6: return True
The following Python example program implements this imperative algorithm.
def solve():
stack = [[]]
s = []
while stack ̸= []:
a = [Link]()
if len(a) == 8:
[Link](a)
14.3. SOLUTION SEARCHING 419
else:
for i in range(1, 9):
if valid(i, a):
[Link](a+[i])
return s
Although there are 8 optional columns for each queen, not all of them are valid and
thus further expanded. Only those columns haven’t been occupied by previous queens
are tried. The algorithm only examines 15720, which is far less than 88 = 16777216,
possibilities [89].
It’s quite easy to extend the algorithm, so that it can solve n queens puzzle, where
n ≥ 4. However, the time cost increases fast. The backtrack algorithm is just slightly
better than the one permuting the sequence of 1 to 8 (which is bound to o(n!)). Another
extension to the algorithm is based on the fact that the chess board is square, which is
symmetric both vertically and horizontally. Thus a solution can generate other solutions
by rotating and flipping. These aspects are left as exercises to the reader.
Peg puzzle
I once received a puzzle of the leap frogs. It said to be homework for 2nd grade student
in China. As illustrated in figure 14.30, there are 6 frogs in 7 stones. Each frog can either
hop to the next stone if it is not occupied, or leap over one frog to another empty stone.
The frogs on the left side can only move to the right, while the ones on the right side can
only move to the left. These rules are described in figure 14.31
The goal of this puzzle is to arrange the frogs to jump according to the rules, so that
the positions of the 3 frogs on the left are finally exchange with the ones on the right. If
we denote the frog on the left as ’A’, on the right as ’B’, and the empty stone as ’O’. The
puzzle is to find a solution to transform from ’AAAOBBB’ to ’BBBOAAA’.
(a) Jump to the next (b) Jump over to the (c) Jump over to the
stone right left
This puzzle is just a special form of the peg puzzles. The number of pegs is not limited
to 6. it can be 8 or other bigger even numbers. Figure 14.32 shows some variants.
We can solve this puzzle by programing. The idea is similar to the 8 queens puzzle.
Denote the positions from the left most stone as 1, 2, ..., 7. In ideal cases, there are 4
options to arrange the move. For example when start, the frog on 3rd stone can hop right
to the empty stone; symmetrically, the frog on the 5th stone can hop left; Alternatively,
the frog on the 2nd stone can leap right, while the frog on the 6th stone can leap left.
We can record the state and try one of these 4 options at every step. Of course not all
of them are possible at any time. If get stuck, we can backtrack and try other options.
As we restrict the left side frogs only moving to the right, and the right frogs only
moving to the left, the moves are not reversible. There won’t be any repetition cases as
what we have to deal with in the maze puzzle. However, we still need record the steps in
order to print them out finally.
In order to enforce these restriction, let A, O, B in representation ’AAAOBBB’ be -1,
0, and 1 respectively. A state L is a list of elements, each element is one of these 3 values.
It starts from {−1, −1, −1, 0, 1, 1, 1}. L[i] access the i-th element, its value indicates if
the i-th stone is empty, occupied by a frog from left side, or occupied by a frog from right
side. Denote the position of the vacant stone as p. The 4 moving options can be stated
as below.
• Leap left: p < 6 and L[p + 2] > 0, swap L[p] ↔ L[p + 2];
• Hop left: p < 7 and L[p + 1] > 0, swap L[p] ↔ L[p + 1];
Four functions leapl (L), hopl (L), leapr (L) and hopr (L) are defined accordingly. If
the state L does not satisfy the move restriction, these function return L unchanged,
otherwise, the changed state L′ is returned accordingly.
We can also explicitly maintain a stack S to the attempts as well as the historic
movements. The stack is initialized with a singleton list of starting state. The solution is
accumulated to a list M , which is empty at the beginning:
As far as the stack isn’t empty, we pop one intermediate attempt. If the latest state
is equal to {1, 1, 1, 0, −1, −1, −1}, a solution is found. We append the series of moves till
this state to the result list M ; otherwise, We expand to next possible state by trying all
four possible moves, and push them back to the stack for further search. Denote the top
14.3. SOLUTION SEARCHING 421
element in the stack S as s1 , and the latest state in s1 as L. The algorithm can be defined
as the following.
M
: S=ϕ
solve(S, M ) = solve(S ′ , {reverse(s1 )} ∪ M )
: L = {1, 1, 1, 0, −1, −1, −1}
solve(P ∪ S ′ , M )
: otherwise
(14.58)
Where P are possible moves from the latest state L:
P = {L′ |L′ ∈ {leapl (L), hopl (L), leapr (L), hopr (L)}, L 6= L′ }
Note that the starting state is stored as the last element, while the final state is the
first. That is the reason why we reverse it when adding to solution list.
Translating this algorithm to Haskell gives the following example program.
solve = dfsSolve [[[-1, -1, -1, 0, 1, 1, 1]]] [] where
dfsSolve [] s = s
dfsSolve (c:cs) s
| head c == [1, 1, 1, 0, -1, -1, -1] = dfsSolve cs (reverse c:s)
| otherwise = dfsSolve ((map (:c) $ moves $ head c) ++ cs) s
Running this program finds 2 symmetric solutions, each takes 15 steps. One solution
is list in the below table.
step -1 -1 -1 0 1 1 1
1 -1 -1 0 -1 1 1 1
2 -1 -1 1 -1 0 1 1
3 -1 -1 1 -1 1 0 1
4 -1 -1 1 0 1 -1 1
5 -1 0 1 -1 1 -1 1
6 0 -1 1 -1 1 -1 1
7 1 -1 0 -1 1 -1 1
8 1 -1 1 -1 0 -1 1
9 1 -1 1 -1 1 -1 0
10 1 -1 1 -1 1 0 -1
11 1 -1 1 0 1 -1 -1
12 1 0 1 -1 1 -1 -1
13 1 1 0 -1 1 -1 -1
14 1 1 1 -1 0 -1 -1
15 1 1 1 0 -1 -1 -1
Observe that the algorithm is in tail recursive manner, it can also be realized imper-
atively. The algorithm can be more generalized, so that it solve the puzzles of n frogs on
each side. We represent the start state {-1, -1, ..., -1, 0, 1, 1, ..., 1} as s, and the mirrored
end state as e.
1: function Solve(s, e)
422 CHAPTER 14. SEARCHING
2: S ← {{s}}
3: M ←ϕ
4: while S 6= ϕ do
5: s1 ← Pop(S)
6: if s1 [1] = e then
7: Add(M , Reverse(s1 ))
8: else
9: for ∀m ∈ Moves(s1 [1]) do
10: Push(S, {m} ∪ s1 )
11: return M
The possible moves can be also generalized with procedure Moves to handle arbitrary
number of frogs. The following Python program implements this solution.
def solve(start, end):
stack = [[start]]
s = []
while stack ̸= []:
c = [Link]()
if c[0] == end:
[Link](reversed(c))
else:
for m in moves(c[0]):
[Link]([m]+c)
return s
def moves(s):
ms = []
n = len(s)
p = [Link](0)
if p < n - 2 and s[p+2] > 0:
[Link](swap(s, p, p+2))
if p < n - 1 and s[p+1] > 0:
[Link](swap(s, p, p+1))
if p > 1 and s[p-2] < 0:
[Link](swap(s, p, p-2))
if p > 0 and s[p-1] < 0:
[Link](swap(s, p, p-1))
return ms
For 3 frogs in each side, we know that it takes 15 steps to exchange them. It’s
interesting to examine the table that how many steps are needed along with the number
of frogs in each side. Our program gives the following result.
number of frogs 1 2 3 4 5 ...
number of steps 3 8 15 24 35 ...
It seems that the number of steps are all square numbers minus one. It’s natural to
guess that the number of steps for n frogs in one side is (n + 1)2 − 1. Actually we can
prove it is true.
Compare to the final state and the start state, each frog moves ahead n + 1 stones
in its opposite direction. Thus total 2n frogs move 2n(n + 1) stones. Another important
fact is that each frog on the left has to meet every one on the right one time. And leap
will happen when meets. Since the frog moves two stones ahead by leap, and there are
total n2 meets happen, so that all these meets cause moving 2n2 stones ahead. The rest
moves are not leap, but hop. The number of hops are 2n(n + 1) − 2n2 = 2n. Sum up all
n2 leaps and 2n hops, the total number of steps are n2 + 2n = (n + 1)2 − 1.
14.3. SOLUTION SEARCHING 423
Summary of DFS
Observe the above three puzzles, although they vary in many aspects, their solutions
show quite similar common structures. They all have some starting state. The maze
starts from the entrance point; The 8 queens puzzle starts from the empty board; The
leap frogs start from the state of ’AAAOBBB’. The solution is a kind of searching, at each
attempt, there are several possible ways. For the maze puzzle, there are four different
directions to try; For the 8 queens puzzle, there are eight columns to choose; For the leap
frogs puzzle, there are four movements of leap or hop. We don’t know how far we can go
when make a decision, although the final state is clear. For the maze, it’s the exit point;
For the 8 queens puzzle, we are done when all the 8 queens being assigned on the board;
For the leap frogs puzzle, the final state is that all frogs exchanged.
We use a common approach to solve them. We repeatedly select one possible candidate
to try, record where we’ve achieved; If we get stuck, we backtrack and try other options.
We are sure by using this strategy, we can either find a solution, or tell that the problem
is unsolvable.
Of course there can be some variation, that we can stop when find one answer, or go
on searching all the solutions.
If we draw a tree rooted at the starting state, expand it so that every branch stands
for a different attempt, our searching process is in a manner, that it searches deeper and
deeper. We won’t consider any other options in the same depth unless the searching fails
so that we’ve to backtrack to upper level of the tree. Figure 14.33 illustrates the order we
search a state tree. The arrow indicates how we go down and backtrack up. The number
of the nodes shows the order we visit them.
Where predicate c(X, Y ) means place X is connected with Y . Note that this is a
directed predicate, we can make Y to be connected with X as well by either adding a
symmetric rule, or create a undirected predicate. Figure 14.34 shows such a directed
graph. Given two places X and Y , Prolog can tell if they are connected by the following
program.
424 CHAPTER 14. SEARCHING
a g
b e h
f d
go(X, X).
go(X, Y) :- c(X, Z), go(Z, Y)
This program says that, a place is connected with itself. Given two different places X
and Y , if X is connected with Z, and Z is connected with Y , then X is connected with
Y . Note that, there might be multiple choices for Z. Prolog selects a candidate, and go
on further searching. It only tries other candidates if the recursive searching fails. In that
case, Prolog backtracks and tries other alternatives. This is exactly what DFS does.
DFS is quite straightforward when we only need a solution, but don’t care if the
solution takes the fewest steps. For example, the solution it gives, may not be the shortest
path for the maze. We’ll see some more puzzles next. They demands to find the solution
with the minimum attempts.
if he does not win, he then stands at the tail of the queue to wait for the second try. This
queue helps to ensure our rule.
We can use the quite same idea to solve our puzzle. The two banks of the river can
be represented as two sets A and B. A contains the wolf, the goat, the cabbage and the
farmer; while B is empty. We take an element along with the farmer from one set to the
other each time. The two sets can’t hold conflict things if the farmer is absent. The goal
is to exchange the contents of A and B with fewest steps.
We initialize a queue with state A = {w, g, c, p}, B = ϕ as the only element. As far as
the queue isn’t empty, we pick the first element from the head, expand it with all possible
options, and put these new expanded candidates to the tail of the queue. If the first
element on the head is the final goal, that A = ϕ, B = {w, g, c, p}, we are done. Figure
14.37 illustrates the idea of this search order. Note that as all possibilities in the same
level are examined, there is no need for back-tracking.
There is a simple way to treat the set. A four bits binary number can be used, each
bit stands for a thing, for example, the wolf w = 1, the goat g = 2, the cabbage c = 4,
and the farmer p = 8. That 0 stands for the empty set, 15 stands for a full set. Value
3, solely means there are a wolf and a goat on the river bank. In this case, the wolf will
eat the goat. Similarly, value 6 stands for another conflicting case. Every time, we move
the highest bit (which is 8), or together with one of the other bits (4 or 2, or 1) from one
number to the other. The possible moves can be defined as below.
{
{(A − 8 − i, B + 8 + i)|i ∈ {0, 1, 2, 4}, i = 0 ∨ A∧i 6= 0} : B<8
mv(A, B) =
{(A + 8 + i, B − 8 − i)|i ∈ {0, 1, 2, 4}, i = 0 ∨ B∧i 6= 0} : Otherwise
(14.59)
Where ∧ is the bitwise-and operation.
the solution can be given by reusing the queue defined in previous chapter. Denote
the queue as Q, which is initialed with a singleton list {(15, 0)}. If Q is not empty,
function DeQ(Q) extracts the head element M , the updated queue becomes Q′ . M is a
list of pairs, stands for a series of movements between the river banks. The first element
in m1 = (A′ , B ′ ) is the latest state. Function EnQ′ (Q, L) is a slightly different enqueue
operation. It pushes all the possible moving sequences in L to the tail of the queue one
by one and returns the updated queue. With these notations, the solution function is
426 CHAPTER 14. SEARCHING
Figure 14.36: A lucky-draw game, the i-th person goes from the queue, pick a ball, then
join the queue at tail if he fails to pick the black ball.
Figure 14.37: Start from state 1, check all possible options 2, 3, and 4 for next step; then
all nodes in level 3, ...
14.3. SOLUTION SEARCHING 427
The following example Haskell program implements this solution. Note that it uses a
plain list to represent the queue for illustration purpose.
import [Link]
moves (a, b) = if b < 8 then trans a b else map swap (trans b a) where
trans x y = [(x - 8 - i, y + 8 + i)
| i ←[0, 1, 2, 4], i == 0 | | (x .&. i) /= 0]
swap (x, y) = (y, x)
This algorithm can be easily modified to find all the possible solutions, but not just
stop after finding the first one. This is left as the exercise to the reader. The following
shows the two best solutions to this puzzle.
Solution 1:
Left river Right
wolf, goat, cabbage, farmer
wolf, cabbage goat, farmer
wolf, cabbage, farmer goat
cabbage wolf, goat, farmer
goat, cabbage, farmer wolf
goat wolf, cabbage, farmer
goat, farmer wolf, cabbage
wolf, goat, cabbage, farmer
Solution 2:
Left river Right
wolf, goat, cabbage, farmer
wolf, cabbage goat, farmer
wolf, cabbage, farmer goat
wolf goat, cabbage, farmer
wolf, goat, farmer cabbage
goat wolf, cabbage, farmer
goat, farmer wolf, cabbage
wolf, goat, cabbage, farmer
This algorithm can also be realized imperatively. Observing that our solution is in tail
recursive manner, we can translate it directly to a loop. We use a list S to hold all the
428 CHAPTER 14. SEARCHING
solutions can be found. The singleton list {(15, 0)} is pushed to queue when initializing.
As long as the queue isn’t empty, we extract the head C from the queue by calling DeQ
procedure. Examine if it reaches the final goal, if not, we expand all the possible moves
and push to the tail of the queue for further searching.
1: function Solve
2: S←ϕ
3: Q←ϕ
4: EnQ(Q, {(15, 0)})
5: while Q 6= ϕ do
6: C ← DeQ(Q)
7: if c1 = (0, 15) then
8: Add(S, Reverse(C))
9: else
10: for ∀m ∈ Moves(C) do
11: if Valid(m, C) then
12: EnQ(Q, {m} ∪ C)
13: return S
Where Moves, and Valid procedures are as same as before. The following Python
example program implements this imperative algorithm.
def solve():
s = []
queue = [[(0xf, 0)]]
while queue ̸= []:
cur = [Link](0)
if cur[0] == (0, 0xf):
[Link](reverse(cur))
else:
for m in moves(cur):
[Link]([m]+cur)
return s
def moves(s):
(a, b) = s[0]
return valid(s, trans(a, b) if b < 8 else swaps(trans(b, a)))
def swaps(s):
return [(b, a) for (a, b) in s]
There is a minor difference between the program and the pseudo code, that the func-
tion to generate candidate moving options filters the invalid cases inside it.
Every time, no matter the farmer drives the boat back and forth, there are m options
for him to choose, where m is the number of objects on the river bank the farmer drives
from. m is always less than 4, that the algorithm won’t take more than n4 times at step
n. This estimation is far more than the actual time, because we avoid trying all invalid
cases. Our solution examines all the possible moving in the worst case. Because we check
recorded steps to avoid repeated attempt, the algorithm takes about O(n2 ) time to search
for n possible steps.
14.3. SOLUTION SEARCHING 429
Instead of thinking from the starting state as shown in figure 14.38. Pòlya pointed out
that there will be 6 quarts of water in the bigger jugs at the final stage, which indicates
the second last step, we can fill the 9 quarts jug, then pour out 3 quarts from it. In order
to achieve this, there should be 1 quart of water left in the smaller jug as shown in figure
14.39.
It’s easy to see that fill the 9 quarters jug, then pour to the 4 quarters jug twice
can bring 1 quarters of water. As shown in figure 14.40. At this stage, we’ve found
the solution. By reversing our findings, we can give the correct steps to bring exactly 6
quarters of water.
Pòlya’s methodology is general. It’s still hard to solve it without concrete algorithm.
For instance, how to bring up 2 gallons of water from 899 and 1147 gallon jugs?
There are 6 ways to deal with 2 jugs in total. Denote the smaller jug as A, the bigger
jug as B.
430 CHAPTER 14. SEARCHING
Figure 14.40: Fill the bigger jugs, and pour to the smaller one twice.
• Empty jug A;
• Empty jug B;
The following sequence shows an example. Note that in this example, we assume that
a < b < 2a.
A B operation
0 0 start
a 0 fill A
0 a pour A into B
a a fill A
2a - b b pour A into B
2a - b 0 empty B
0 2a - b pour A into B
a 2a - b fill A
3a - 2b b pour A into B
... ... ...
No matter what the above operations are taken, the amount of water in each jug can
be expressed as xa + yb, where a and b are volumes of jugs, for some integers x and y. All
the amounts of water we can get are linear combination of a and b. We can immediately
tell given two jugs, if a goal g is solvable or not.
For instance, we can’t bring 5 gallons of water with two jugs of volume 4 and 6 gallon.
The number theory ensures that, the 2 water jugs puzzle can be solved if and only if g
can be divided by the greatest common divisor of a and b. Written as:
Where m|n means n can be divided by m. What’s more, if a and b are relatively
prime, which means gcd(a, b) = 1, it’s possible to bring up any quantity g of water.
Although gcd(a, b) enables us to determine if the puzzle is solvable, it doesn’t give us
the detailed pour sequence. If we can find some integer x and y, so that g = xa + yb. We
can arrange a sequence of operations (even it may not be the best solution) to solve it.
14.3. SOLUTION SEARCHING 431
The idea is that, without loss of generality, suppose x > 0, y < 0, we need fill jug A by x
times, and empty jug B by y times in total.
Let’s take a = 3, b = 5, and g = 4 for example, since 4 = 3 × 3 − 5, we can arrange a
sequence like the following.
A B operation
0 0 start
3 0 fill A
0 3 pour A into B
3 3 fill A
1 5 pour A into B
1 0 empty B
0 1 pour A into B
3 1 fill A
0 4 pour A into B
In this sequence, we fill A by 3 times, and empty B by 1 time. The procedure can be
described as the following:
Repeat x times:
1. Fill jug A;
So the only problem left is to find the x and y. There is a powerful tool in number
theory called, Extended Euclid algorithm, which can achieve this. Compare to the classic
Euclid GCD algorithm, which can only give the greatest common divisor, The extended
Euclid algorithm can give a pair of x, y as well, so that:
b = aq + r (14.64)
Since d is the common divisor, it can divide both a and b, thus d can divide r as well.
Because r is less than a, we can scale down the problem by finding GCD of a and r:
d = x′ (b − aq) + y ′ a
(14.66)
= (y ′ − x′ q)a + x′ b
Note that this is a typical recursive relationship. The edge case happens when a = 0.
gcd(0, b) = b = 0a + 1b (14.68)
432 CHAPTER 14. SEARCHING
Summarize the above result, the extended Euclid algorithm can be defined as the
following:
{
(b, 0, 1) : a=0
gcdext (a, b) = b (14.69)
(d, y − x′ , x′ ) :
′
otherwise
a
Where d, x′ , y ′ are defined in equation (14.65).
The 2 water jugs puzzle is almost solved, but there are still two detailed problems
need to be tackled. First, extended Euclid algorithm gives the linear combination for the
greatest common divisor d. While the target volume of water g isn’t necessarily equal to
d. This can be easily solved by multiplying x and y by m times, where m = g/gcd(a, b);
Second, we assume x > 0, to form a procedure to fill jug A with x times. However, the
extended Euclid algorithm doesn’t ensure x to be positive. For instance gcdext (4, 9) =
(1, −2, 1). Whenever we get a negative x, since d = xa + yb, we can continuously add b
to x, and decrease y by a till x is greater than zero.
At this stage, we are able to give the complete solution to the 2 water jugs puzzle.
Below is an example Haskell program.
extGcd 0 b = (b, 0, 1)
extGcd a b = let (d, x', y') = extGcd (b `mod` a) a in
(d, y' - x' ∗ (b `div` a), x')
Although we can solve the 2 water jugs puzzle with extended Euclid algorithm, the
solution may not be the best. For instance, when we are going to bring 4 gallons of
water from 3 and 5 gallons jugs. The extended Euclid algorithm produces the following
sequence:
[(0,0),(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),
(0,4),(3,4),(2,5),(2,0),(0,2),(3,2),(0,5),(3,5),
(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),(0,4)]
It takes 23 steps to achieve the goal, while the best solution only need 6 steps:
[(0,0),(0,5),(3,2),(0,2),(2,0),(2,5),(3,4)]
Observe the 23 steps, and we can find that jug B has already contained 4 gallons of
water at the 8-th step. But the algorithm ignores this fact and goes on executing the left
15 steps. The reason is that the linear combination x and y we find with the extended
Euclid algorithm are not the only numbers satisfying g = xa + by. For all these numbers,
the smaller |x| + |y|, the less steps are needed. There is an exercise to addressing this
problem in this section.
The interesting problem is how to find the best solution? We have two approaches,
one is to find x and y to minimize |x| + |y|; the other is to adopt the quite similar idea as
the wolf-goat-cabbage puzzle. We focus on the latter in this section. Since there are at
14.3. SOLUTION SEARCHING 433
most 6 possible options: fill A, fill B, pour A into B, pour B into A, empty A and empty
B, we can try them in parallel, and check which decision can lead to the best solution.
We need record all the states we’ve achieved to avoid any potential repetition. In order to
realize this parallel approach with reasonable resources, a queue can be used to arrange
our attempts. The elements stored in this queue are series of pairs (p, q), where p and q
represent the volume of waters contained in each jug. These pairs record the sequence of
our operations from the beginning to the latest. We initialize the queue with the singleton
list contains the starting state {(0, 0)}.
Every time, when the queue isn’t empty, we pick a sequence from the head of the
queue. If this sequence ends with a pair contains the target volume g, we find a solution,
we can print this sequence by reversing it; Otherwise, we expand the latest pair by trying
all the possible 6 options, remove any duplicated states, and add them to the tail of the
queue. Denote the queue as Q, the first sequence stored on the head of the queue as S,
the latest pair in S as (p, q), and the rest of pairs as S ′ . After popping the head element,
the queue becomes Q′ . This algorithm can be defined like below:
ϕ : Q=ϕ
solve′ (Q) = reverse(S) : p = g ∨ q = g (14.71)
solve′ (EnQ′ (Q′ , {{s′ } ∪ S ′ |s′ ∈ try(S)})) : otherwise
Where function EnQ′ pushes a list of sequence to the queue one by one. Function
try(S) will try all possible 6 options to generate new pairs of water volumes:
f illA(p, q), f illB(p, q),
try(S) = {s′ |s′ ∈ pourA(p, q), pourB(p, q), , s′ ∈
/ S′} (14.72)
emptyA(p, q), emptyB(p, q)
It’s intuitive to define the 6 options. For fill operations, the result is that the volume of
the filled jug is full; for empty operation, the result volume is empty; for pour operation,
we need test if the jug is big enough to hold all the water.
This method always returns the fast solution. It can also be realized in imperative
approach. Instead of storing the complete sequence of operations in every element in the
queue, we can store the unique state in a global history list, and use links to track the
operation sequence, this can save spaces.
434 CHAPTER 14. SEARCHING
(0, 0)
(3, 0)
(0, 5)
(0, 0) (3, 5)
(0, 3)
(3, 2)
...
fill A flll B
(3, 0) (0, 5)
...
The idea is illustrated in figure 14.41. The initial state is (0, 0). Only ‘fill A’ and ‘fill
B’ are possible. They are tried and added to the record list; Next we can try and record
‘fill B’ on top of (3, 0), which yields new state (3, 5). However, when try ‘empty A’ from
state (3, 0), we would return to the start state (0, 0). As this previous state has been
recorded, it is ignored. All the repeated states are in gray color in this figure.
With such settings, we needn’t remember the operation sequence in each element in
the queue explicitly. We can add a ‘parent’ link to each node in figure 14.41, and use it to
back-traverse to the starting point from any state. The following example ANSI C code
shows such a definition.
struct Step {
int p, q;
struct Step∗ parent;
};
Where p, q are volumes of water in the 2 jugs. For any state s, define functions p(s)
and q(s) return these 2 values, the imperative algorithm can be realized based on this
idea as below.
1: function Solve(a, b, g)
2: Q←ϕ
3: Push-and-record(Q, (0, 0))
4: while Q 6= ϕ do
5: s ← Pop(Q)
6: if p(s) = g ∨ q(s) = g then
7: return s
8: else
14.3. SOLUTION SEARCHING 435
9: C ← Expand(s)
10: for ∀c ∈ C do
11: if c 6= s ∧ ¬ Visited(c) then
12: Push-and-record(Q, c)
13: return NIL
Where Push-and-record does not only push an element to the queue, but also
record this element as visited, so that we can check if an element has been visited before
in the future. This can be implemented with a list. All push operations append the new
elements to the tail. For pop operation, instead of removing the element pointed by head,
the head pointer only advances to the next one. This list contains historic data which
has to be reset explicitly. The following ANSI C code illustrates this idea.
struct Step ∗steps[1000], ∗∗head, ∗∗tail = steps;
void reset() {
struct Step ∗∗p;
for (p = steps; p ̸= tail; ++p)
free(∗p);
head = tail = steps;
}
In order to test a state has been visited, we can traverse the list to compare p and q.
int eq(struct Step∗ a, struct Step∗ b) {
return a→p == b→p && a→q == b→q;
}
And the result steps is back tracked in reversed order, it can be output with a recursive
function:
void print(struct Step∗ s) {
if (s) {
print(s→parent);
printf("%d, %dλn", s→p, s→q);
}
}
Kloski
Kloski is a block sliding puzzle. It appears in many countries. There are different sizes
and layouts. Figure 14.42 illustrates a traditional Kloski game in China.
In this puzzle, there are 10 blocks, each is labeled with text or icon. The smallest
block has size of 1 unit square, the biggest one is 2 × 2 units size. Note there is a slot of
2 units wide at the middle-bottom of the board. The biggest block represents a king in
ancient time, while the others are enemies. The goal is to move the biggest block to the
slot, so that the king can escape. This game is named as ‘Huarong Dao’, or ‘Huarong
Escape’ in China. Figure 14.43 shows the similar Kloski puzzle in Japan. The biggest
block means daughter, while the others are her family members. This game is named as
‘Daughter in the box’ in Japan (Japanese name: hakoiri musume).
In this section, we want to find a solution, which can slide blocks from the initial state
to the final state with the minimum movements.
The intuitive idea to model this puzzle is to use a 5 × 4 matrix representing the board.
All pieces are labeled with a number. The following matrix M , for example, shows the
initial state of the puzzle.
14.3. SOLUTION SEARCHING 437
1 10 10 2
1 10 10 2
M =
3 4 4 5
3 7 8 5
6 0 0 9
In this matrix, the cells of value i mean the i-th piece covers this cell. The special
value 0 represents a free cell. By using sequence 1, 2, ... to identify pieces, a special
layout can be further simplified as an array L. Each element is a list of cells covered by
the piece indexed with this element. For example, L[4] = {(3, 2), (3, 3)} means the 4-th
piece covers cells at position (3, 2) and (3, 3), where (i, j) means the cell at row i and
column j.
The starting layout can be written as the following Array.
{{(1, 1), (2, 1)}, {(1, 4), (2, 4)}, {(3, 1), (4, 1)}, {(3, 2), (3, 3)}, {(3, 4), (4, 4)},
{(5, 1)}, {(4, 2)}, {(4, 3)}, {(5, 4)}, {(1, 2), (1, 3), (2, 2), (2, 3)}}
When moving the Kloski blocks, we need examine all the 10 blocks, checking each
block if it can move up, down, left and right. it seems that this approach would lead to
a very huge amount of possibilities, because each step might have 10 × 4 options, there
will be about 40n cases in the n-th step.
Actually, there won’t be so much options. For example, in the first step, there are
only 4 valid moving: the 6-th piece moves right; the 7-th and 8-th move down; and the
9-th moves left.
All others are invalid moving. Figure 14.44 shows how to test if the moving is possible.
The left example illustrates sliding block labeled with 1 down. There are two cells
covered by this block. The upper 1 moves to the cell previously occupied by this same
block, which is also labeled with 1; The lower 1 moves to a free cell, which is labeled with
0;
The right example, on the other hand, illustrates invalid sliding. In this case, the
upper cells could move to the cell occupied by the same block. However, the lower cell
labeled with 1 can’t move to the cell occupied by other block, which is labeled with 2.
In order to test the valid moving, we need examine all the cells a block will cover. If
they are labeled with 0 or a number as same as this block, the moving is valid. Otherwise
it conflicts with some other block. For a layout L, the corresponding matrix is M , suppose
we want to move the k-th block with (∆x, ∆y), where |∆x| ≤ 1, |∆y| ≤ 1. The following
equation tells if the moving is valid:
Figure 14.44: Left: both the upper and the lower 1 are OK; Right: the upper 1 is OK,
the lower 1 conflicts with 2.
Another important point to solve Kloski puzzle, is about how to avoid repeated at-
tempts. The obvious case is that after a series of sliding, we end up a matrix which
have been transformed from. However, it is not enough to only avoid the same matrix.
Consider the following two metrics. Although M1 6= M2 , we need drop options to M2 ,
because they are essentially the same.
1 10 10 2 2 10 10 1
1 10 10 2 2 10 10 1
3 4 4 5
M1 = M2 = 3 4 4 5
3 7 8 5 3 7 6 5
6 0 0 9 8 0 0 9
This fact tells us, that we should compare the layout, but not merely matrix to avoid
repetition. Denote the corresponding layouts as L1 and L2 respectively, it’s easy to verify
that ||L1 || = ||L2 ||, where ||L|| is the normalized layout, which is defined as below:
||L|| = sort({sort(li )|∀li ∈ L}) (14.75)
In other words, a normalized layout is ordered for all its elements, and every element
is also ordered. The ordering can be defined as that (a, b) ≤ (c, d) ⇔ an + b ≤ cn + d,
where n is the width of the matrix.
Observing that the Kloski board is symmetric, thus a layout can be mirrored from
another one. Mirrored layout is also a kind of repeating, which should be avoided. The
following M1 and M2 show such an example.
10 10 1 2 3 1 10 10
10 10 1 2 3 1 10 10
M1 = 3 5 4 4 M2 = 4 4 2 5
3 5 8 9 7 6 2 5
6 7 0 0 0 0 9 8
Note that, the normalized layouts are symmetric to each other. It’s easy to get a
mirrored layout like this:
mirror(L) = {{(i, n − j + 1)|∀(i, j) ∈ l}|∀l ∈ L} (14.76)
14.3. SOLUTION SEARCHING 439
We find that the matrix representation is useful in validating the moving, while the
layout is handy to model the moving and avoid repeated attempt. We can use the similar
approach to solve the Kloski puzzle. We need a queue, every element in the queue contains
two parts: a series of moving and the latest layout led by the moving. Each moving is in
form of (k, (∆y, ∆x)), which means moving the k-th block, with ∆y in row, and ∆x in
column in the board.
The queue contains the starting layout when initialized. Whenever this queue isn’t
empty, we pick the first one from the head, checking if the biggest block is on target,
that L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)}. If yes, then we are done; otherwise, we try to
move every block with 4 options: left, right, up, and down, and store all the possible,
unique new layout to the tail of the queue. During this searching, we need record all the
normalized layouts we’ve ever found to avoid any duplication.
Denote the queue as Q, the historic layouts as H, the first layout on the head of the
queue as L, its corresponding matrix as M . and the moving sequence to this layout as S.
The algorithm can be defined as the following.
ϕ : Q=ϕ
solve(Q, H) = reverse(S) : L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)} (14.77)
solve(Q′ , H ′ ) : otherwise
The first clause says that if the queue is empty, we’ve tried all the possibilities and
can’t find a solution; The second clause finds a solution, it returns the moving sequence in
reversed order; These are two edge cases. Otherwise, the algorithm expands the current
layout, puts all the valid new layouts to the tail of the queue to yield Q′ , and updates
the normalized layouts to H ′ . Then it performs recursive searching.
In order to expand a layout to valid unique new layouts, we can define a function as
below:
Where L′ is the the new layout by moving the k-th block with (∆y, ∆x) from L, M ′
is the corresponding matrix, and M ′′ is the matrix to the mirrored layout of L′ . Function
unique is defined like this:
unique(L′ , H) = M ′ ∈
/ H ∧ M ′′ ∈
/H (14.79)
We’ll next show some example Haskell Kloski programs. As array isn’t mutable in
the purely functional settings, tree based map is used to represent layout 11 . Some type
synonyms are defined as below:
import qualified [Link] as M
import [Link]
import [Link] (sort)
The main program is almost as same as the solve(Q, H) function defined above.
11 Alternatively, finger tree based sequence shown in previous chapter can be used
440 CHAPTER 14. SEARCHING
Where function layout gives the normalized form by sorting. move returns the
updated map by sliding the i-th block with (∆y, ∆x).
layout = sort ◦ map sort ◦ [Link]
Function expand gives all the possible new options. It can be directly translated from
expand(L, H).
expand :: Layout → [[[Point]]] → [Move]
expand x visit = [(i, d) | i ←[1..10],
d ← [(0, -1), (0, 1), (-1, 0), (1, 0)],
valid i d, unique i d] where
valid i d = all (λp → let p' = shift p d in
inRange (bounds board) p' &&
([Link] $ [Link] (elem p') x) `elem` [[i], []])
(maybe [] id $ [Link] i x)
unique i d = let mv = move x (i, d) in
all (`notElem` visit) (map layout [mv, mirror mv])
Note that we also filter out the mirrored layouts. The mirror function is given as
the following.
mirror = [Link] (map (λ (y, x) → (y, 5 - x)))
This program takes several minutes to produce the best solution, which takes 116
steps. The final 3 steps are shown as below:
...
The Kloski solution can also be realized imperatively. Note that the solve(Q, H) is
tail-recursive, it’s easy to transform the algorithm with looping. We can also link one
layout to its parent, so that the moving sequence can be recorded globally. This can save
some spaces, as the queue needn’t store the moving information in every element. When
output the result, we only need back-tracking to the starting layout from the last one.
Suppose function Link(L′ , L) links a new layout L′ to its parent layout L. The
following algorithm takes a starting layout, and searches for best moving sequence.
1: function Solve(L0 )
2: H ← ||L0 ||
3: Q←ϕ
4: Push(Q, Link(L0 , NIL))
5: while Q 6= ϕ do
6: L ← Pop(Q)
7: if L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)} then
8: return L
9: else
10: for each L′ ∈ Expand(L, H) do
11: Push(Q, Link(L′ , L))
12: Append(H, ||L′ ||)
13: return NIL ▷ No solution
The following example Python program implements this algorithm:
class Node:
def __init__(self, l, p = None):
[Link] = l
[Link] = p
def solve(start):
visit = set([normalize(start)])
queue = deque([Node(start)])
while queue:
cur = [Link]()
layout = [Link]
if layout[-1] == [(4, 2), (4, 3), (5, 2), (5, 3)]:
return cur
else:
for brd in expand(layout, visit):
[Link](Node(brd, cur))
[Link](normalize(brd))
return None # no solution
Like most programming languages, arrays are indexed from 0 but not 1 in Python.
This has to be handled properly. The rest functions including mirror, matrix, and
move are implemented as the following.
def mirror(layout):
return [[(y, 5 - x) for (y, x) in r] for r in layout]
def matrix(layout):
m = [[0]∗4 for _ in range(5)]
for (i, ps) in zip(range(1, 11), layout):
for (y, x) in ps:
m[y - 1][x - 1] = i
return m
def dup(layout):
return [r[:] for r in layout]
It’s possible to modify this Kloski algorithm, so that it does not only stop at the first
solution, but also search all the solutions. In such case, the computation time is bound to
the size of a space V , where V holds all the layouts can be transformed from the starting
layout. If all these layouts are stored globally, with a parent field point to the predecessor,
the space requirement of this algorithm is also bound to O(V ).
Summary of BFS
The above three puzzles, the wolf-goat-cabbage puzzle, the water jugs puzzle, and the
Kloski puzzle show some common solution structure. Similar to the DFS problems, they
all have the starting state and the end state. The wolf-goat-cabbage puzzle starts with
the wolf, the goat, the cabbage and the farmer all in one side, while the other side is
empty. It ends up in a state that they all moved to the other side. The water jugs puzzle
starts with two empty jugs, and ends with either jug contains a certain volume of water.
The Kloski puzzle starts from a layout and ends to another layout that the biggest block
begging slided to a given position.
All problems specify a set of rules which can transfer from one state to another. Dif-
ferent form the DFS approach, we try all the possible options ‘in parallel’. We won’t
search further until all the other alternatives in the same step have been examined. This
method ensures that the solution with the minimum steps can be found before those with
more steps. Review and compare the two figures we’ve drawn before shows the differ-
ence between these two approaches. For the later one, because we expand the searching
horizontally, it is called as Breadth-first search (BFS for short).
As we can’t perform search really in parallel, BFS realization typically utilizes a queue
to store the search options. The candidate with less steps pops from the head, while the
14.3. SOLUTION SEARCHING 443
new candidate with more steps is pushed to the tail of the queue. Note that the queue
should meet constant time enqueue and dequeue requirement, which we’ve explained in
previous chapter of queue. Strictly speaking, the example functional programs shown
above don’t meet this criteria. They use list to mimic queue, which can only provide
linear time pushing. Readers can replace them with the functional queue we explained
before.
BFS provides a simple method to search for optimal solutions in terms of the number
of steps. However, it can’t search for more general optimal solution. Consider another
directed graph as shown in figure 14.46, the length of each section varies. We can’t use
BFS to find the shortest route from one city to another.
a g
15 4 9
b e h 8
11 10 5 12
7 f d
Note that the shortest route from city a to city c isn’t the one with the fewest steps
a → b → c. The total length of this route is 22; But the route with more steps a → e →
f → c is the best. The length of it is 20. The coming sections introduce other algorithms
to search for optimal solution.
be solved by brute-force. Nevertheless, we’ve found that, for some of them, There exists
special simplified ways to search the optimal solution.
Grady algorithm
Huffman coding
Huffman coding is a solution to encode information with the shortest length of code.
Consider the popular ASCII code, which uses 7 bits to encode characters, digits, and
symbols. ASCII code can represent 27 = 128 different symbols. With 0, 1 bits, we need
at least log2 n bits to distinguish n different symbols. For text with only case insensitive
English letters, we can define a code table like below.
char code char code
A 00000 N 01101
B 00001 O 01110
C 00010 P 01111
D 00011 Q 10000
E 00100 R 10001
F 00101 S 10010
G 00110 T 10011
H 00111 U 10100
I 01000 V 10101
J 01001 W 10110
K 01010 X 10111
L 01011 Y 11000
M 01100 Z 11001
With this code table, text ‘INTERNATIONAL’ is encoded to 65 bits.
00010101101100100100100011011000000110010001001110101100000011010
Observe the above code table, which actually maps the letter ‘A’ to ’Z’ from 0 to 25.
There are 5 bits to represent every code. Code zero is forced as ’00000’ but not ’0’ for
example. Such kind of coding method, is called fixed-length coding.
Another coding method is variable-length coding. That we can use just one bit ‘0’
for ‘A’, two bits ‘10’ for C, and 5 bits ‘11001’ for ‘Z’. Although this approach can shorten
the total length of the code for ‘INTERNATIONAL’ from 65 bits dramatically, it causes
problem when decoding. When processing a sequence of bits like ‘1101’, we don’t know
if it means ‘1’ followed by ‘101’, which stands for ‘BF’; or ‘110’ followed by ‘1’, which is
‘GB’, or ‘1101’ which is ‘N’.
The famous Morse code is variable-length coding system. That the most used letter
‘E’ is encoded as a dot, while ‘Z’ is encoded as two dashes and two dots. Morse code uses
a special pause separator to indicate the termination of a code, so that the above problem
won’t happen. There is another solution to avoid ambiguity. Consider the following code
table.
char code char code
A 110 E 1110
I 101 L 1111
N 01 O 000
R 001 T 100
Text ‘INTERNATIONAL’ is encoded to 38 bits only:
10101100111000101110100101000011101111
14.3. SOLUTION SEARCHING 445
If decode the bits against the above code table, we won’t meet any ambiguity symbols.
This is because there is no code for any symbol is the prefix of another one. Such code
is called prefix-code. (You may wonder why it isn’t called as non-prefix code.) By using
prefix-code, we needn’t separators at all. So that the length of the code can be shorten.
This is a very interesting problem. Can we find a prefix-code table, which produce the
shortest code for a given text? The very same problem was given to David A. Huffman
in 1951, who was still a student in MIT[91]. His professor Robert M. Fano told the class
that those who could solve this problem needn’t take the final exam. Huffman almost
gave up and started preparing the final exam when he found the most efficient answer.
The idea is to create the coding table according to the frequency of the symbol ap-
peared in the text. The more used symbol is assigned with the shorter code.
It’s not hard to process some text, and calculate the occurrence for each symbol. So
that we have a symbol set, each one is augmented with a weight. The weight can be
the number which indicates the frequency this symbol occurs. We can use the number of
occurrence, or the probabilities for example.
Huffman discovered that a binary tree can be used to generate prefix-code. All symbols
are stored in the leaf nodes. The codes are generated by traversing the tree from root.
When go left, we add a zero; and when go right we add a one.
Figure 14.47 illustrates a binary tree. Taking symbol ’N’ for example, starting from
the root, we first go left, then right and arrive at ’N’. Thus the code for ’N’ is ’01’; While
for symbol ’A’, we can go right, right, then left. So ’A’ is encode to ’110’. Note that this
approach ensures none code is the prefix of the other.
13
5 8
2 N, 3 4 4
O, 1 R, 1 T, 2 I, 2 A, 2 2
E, 1 L, 1
Note that this tree can also be used directly for decoding. When scan a series of bits,
if the bit is zero, we go left; if the bit is one, we go right. When arrive at a leaf, we decode
a symbol from that leaf. And we restart from the root of the tree for the coming bits.
Given a list of symbols with weights, we need build such a binary tree, so that the
symbol with greater weight has shorter path from the root. Huffman developed a bottom-
up solution. When start, all symbols are put into a leaf node. Every time, we pick two
nodes, which has the smallest weight, and merge them into a branch node. The weight of
this branch is the sum of its two children. We repeatedly pick the two smallest weighted
nodes and merge till there is only one tree left. Figure 14.48 illustrates such a building
process.
We can reuse the binary tree definition to formalize Huffman coding. We augment
the weight information, and the symbols are only stored in leaf nodes. The following C
like definition, shows an example.
446 CHAPTER 14. SEARCHING
2 2 4
E, 1 L, 1 O, 1 R, 1 T, 2 I, 2
A, 2 2 2 N, 3
E, 1 L, 1 O, 1 R, 1
(d) 4. (e) 5.
8
4 4
T, 2 I, 2 A, 2 2
E, 1 L, 1
(f) 6.
13
5 8
2 N, 3 4 4
O, 1 R, 1 T, 2 I, 2 A, 2 2
E, 1 L, 1
(g) 7.
struct Node {
int w;
char c;
struct Node ∗left, ∗right;
};
Some limitation can be added to the definition, as empty tree isn’t allowed. A Huffman
tree is either a leaf, which contains a symbol and its weight; or a branch, which only holds
total weight of all leaves. The following Haskell code, for instance, explicitly specifies these
two cases.
data HTr w a = Leaf w a | Branch w (HTr w a) (HTr w a)
When merge two Huffman trees T1 and T2 to a bigger one, These two trees are set as
its children. We can select either one as the left, and the other as the right. the weight
of the result tree T is the sum of its two children. so that w = w1 + w2 . Define T1 < T2 if
w1 < w2 , One possible Huffman tree building algorithm can be realized as the following.
{
T1 : A = {T1 }
build(A) = (14.80)
build({merge(Ta , Tb )} ∪ A′ ) : otherwise
A is a list of trees. It is initialized as leaves for all symbols and their weights. If there
is only one tree in this list, we are done, the tree is the final Huffman tree. Otherwise,
The two smallest tree Ta and Tb are extracted, and the rest trees are hold in list A′ . Ta
and Tb are merged to one bigger tree, and put back to the tree list for further recursive
building.
We can scan the tree list to extract the 2 nodes with the smallest weight. Below
equation shows that when the scan begins, the first 2 elements are compared and initialized
as the two minimum ones. An empty accumulator is passed as the last argument.
For every tree, if its weight is less than the smallest two we’ve ever found, we update
the result to contain this tree. For any given tree list A, denote the first tree in it as T1 ,
and the rest trees except T1 as A′ . The scan process can be defined as the following.
(Ta , Tb , B) : A = ϕ
extract′ (Ta , Tb , A, B) = extract′ (Ta′ , Tb′ , A′ , {Tb } ∪ A) : T1 < Tb (14.83)
extract′ (Ta , Tb , A′ , {T1 } ∪ A) : otherwise
Where Ta′ = min(T1 , Ta ), Tb′ = max(T1 , Ta ) are the updated two trees with the
smallest weights.
The following Haskell example program implements this Huffman tree building algo-
rithm.
build [x] = x
build xs = build ((merge x y) : xs') where
(x, y, xs') = extract xs
This building solution can also be realized imperatively. Given an array of Huffman
nodes, we can use the last two cells to hold the nodes with the smallest weights. Then we
scan the rest of the array from right to left. Whenever there is a node with the smaller
weight, this node will be exchanged with the bigger one of the last two. After all nodes
have been examined, we merge the trees in the last two cells, and drop the last cell. This
shrinks the array by one. We repeat this process till there is only one tree left.
1: function Huffman(A)
2: while |A| > 1 do
3: n ← |A|
4: for i ← n − 2 down to 1 do
5: if A[i] < Max(A[n], A[n − 1]) then
6: Exchange A[i] ↔ Max(A[n], A[n − 1])
7: A[n − 1] ← Merge(A[n], A[n − 1])
8: Drop(A[n])
9: return A[1]
The following C++ example program implements this algorithm. Note that this
algorithm needn’t the last two elements being ordered.
typedef vector<Node∗> Nodes;
The algorithm merges all the leaves, and it need scan the list in each iteration. Thus
the performance is quadratic. This algorithm can be improved. Observe that each time,
only the two trees with the smallest weights are merged. This reminds us the heap data
structure. Heap ensures to access the smallest element fast. We can put all the leaves
in a heap. For binary heap, this is typically a linear operation. Then we extract the
minimum element twice, merge them, then put the bigger tree back to the heap. This is
O(lg n) operation if binary heap is used. So the total performance is O(n lg n), which is
better than the above algorithm. The next algorithm extracts the node from the heap,
and starts Huffman tree building.
This algorithm stops when the heap is empty; Otherwise, it extracts another nodes
from the heap for merging.
{
T : H=ϕ
reduce(T, H) = (14.85)
build(insert(merge(T, top(H)), pop(H))) : otherwise
14.3. SOLUTION SEARCHING 449
Function build and reduce are mutually recursive. The following Haskell example
program implements this algorithm by using heap defined in previous chapter.
huffman' :: (Num a, Ord a) ⇒ [(b, a)] → HTr a b
huffman' = build' ◦ [Link] ◦ map (λ(c, w) → Leaf w c) where
build' h = reduce ([Link] h) ([Link] h)
reduce x Heap.E = x
reduce x h = build' $ [Link] ([Link] h) (merge x ([Link] h))
The heap solution can also be realized imperatively. The leaves are firstly transformed
to a heap, so that the one with the minimum weight is put on the top. As far as there are
more than 1 elements in the heap, we extract the two smallest, merge them to a bigger
one, and put back to the heap. The final tree left in the heap is the result Huffman tree.
1: function Huffman’(A)
2: Build-Heap(A)
3: while |A| > 1 do
4: Ta ← Heap-Pop(A)
5: Tb ← Heap-Pop(A)
6: Heap-Push(A, Merge(Ta , Tb ))
7: return Heap-Pop(A)
The following example C++ code implements this heap solution. The heap used here
is provided in the standard library. Because the max-heap, but not min-heap would be
made by default, a greater predication is explicitly passed as argument.
bool greaterp(Node∗ a, Node∗ b) { return b→w < a→w; }
Node∗ pop(Nodes& h) {
Node∗ m = [Link]();
pop_heap([Link](), [Link](), greaterp);
h.pop_back();
return m;
}
When the symbol-weight list has been already sorted, there exists a linear time method
to build the Huffman tree. Observe that during the Huffman tree building, it produces a
series of merged trees with weight in ascending order. We can use a queue to manage the
merged trees. Every time, we pick the two trees with the smallest weight from both the
queue and the list, merge them and push the result to the queue. All the trees in the list
will be processed, and there will be only one tree left in the queue. This tree is the result
Huffman tree. This process starts by passing an empty queue as below.
build′ (A) = reduce′ (extract′′ (ϕ, A)) (14.86)
Suppose A is in ascending order by weight, At any time, the tree with the smallest
weight is either the header of the queue, or the first element of the list. Denote the header
450 CHAPTER 14. SEARCHING
of the queue is Ta , after pops it, the queue is Q′ ; The first element in A is Tb , the rest
elements are hold in A′ . Function extract′′ can be defined like the following.
(Tb , (Q, A′ )) : Q=ϕ
′′
extract (Q, A) = (Ta , (Q′ , A)) : A = ϕ ∨ Ta < Tb (14.87)
(Tb , (Q, A′ )) : otherwise
Actually, the pair of queue and tree list can be viewed as a special heap. The tree
with the minimum weight is continuously extracted and merged.
′
reduce
{ (T, (Q, A)) =
T : Q=ϕ∧A=ϕ (14.88)
reduce′ (extract′′ (push(Q′′ , merge(T, T ′ )), A′′ )) : otherwise
Where (T ′ , (Q′′ , A′′ )) = extract′′ (Q, A), which means extracting another tree. The
following Haskell example program shows the implementation of this method. Note that
this program explicitly sort the leaves, which isn’t necessary if the leaves are ordered.
Again, the list, but not a real queue is used here for illustration purpose. List isn’t good
at pushing new element, please refer to the chapter of queue for details about it.
huffman'' :: (Num a, Ord a) ⇒ [(b, a)] → HTr a b
huffman'' = reduce ◦ wrap ◦ sort ◦ map (λ(c, w) → Leaf w c) where
wrap xs = delMin ([], xs)
reduce (x, ([], [])) = x
reduce (x, h) = let (y, (q, xs)) = delMin h in
reduce $ delMin (q ++ [merge x y], xs)
delMin ([], (x:xs)) = (x, ([], xs))
delMin ((q:qs), []) = (q, (qs, []))
delMin ((q:qs), (x:xs)) | q < x = (q, (qs, (x:xs)))
| otherwise = (x, ((q:qs), xs))
t = [Link]();
[Link]();
} else {
t = [Link]();
ts.pop_back();
}
return t;
}
Note that the sorting isn’t necessary if the trees have already been ordered. It can be
a linear time reversing in case the trees are in ascending order by weight.
There are three different Huffman man tree building methods explained. Although
they follow the same approach developed by Huffman, the result trees varies. Figure 14.49
shows the three different Huffman trees built with these methods.
13
13
5 8
5 8
A, 2 N, 3 4 4
2 N, 3 4 4
2 T, 2 2 I, 2
O, 1 R, 1 T, 2 I, 2 A, 2 2
L, 1 E, 1 O, 1 R, 1 E, 1 L, 1
5 8
2 N, 3 4 4
O, 1 R, 1 A, 2 I, 2 T, 2 2
E, 1 L, 1
Figure 14.49: Variation of Huffman trees for the same symbol list.
Although these three trees are not identical. They are all able to generate the most
efficient code. The formal proof is skipped here. The detailed information can be referred
to [91] and Section 16.3 of [4].
The Huffman tree building is the core idea of Huffman coding. Many things can be
easily achieved with the Huffman tree. For example, the code table can be generated by
traversing the tree. We start from the root with the empty prefix p. For any branches, we
append a zero to the prefix if turn left, and append a one if turn right. When a leaf node
is arrived, the symbol represented by this node and the prefix are put to the code table.
452 CHAPTER 14. SEARCHING
Denote the symbol of a leaf node as c, the children for tree T as Tl and Tr respectively.
The code table association list can be built with code(T, ϕ), which is defined as below.
{
{(c, p)} : leaf (T )
code(T, p) = (14.89)
code(Tl , p ∪ {0}) ∪ code(Tr , p ∪ {1}) : otherwise
Where function leaf (T ) tests if tree T is a leaf or a branch node. The following Haskell
example program generates a map as the code table according to this algorithm.
code tr = [Link] $ traverse [] tr where
traverse bits (Leaf _ c) = [(c, bits)]
traverse bits (Branch _ l r) = (traverse (bits ++ [0]) l) ++
(traverse (bits ++ [1]) r)
The imperative code table generating algorithm is left as exercise. The encoding
process can scan the text, and look up the code table to output the bit sequence. The
realization is skipped here.
The decoding process is realized by looking up the Huffman tree according to the bit
sequence. We start from the root, whenever a zero is received, we turn left, otherwise
if a one is received, we turn right. If a leaf node is arrived, the symbol represented by
this leaf is output, and we start another looking up from the root. The decoding process
ends when all the bits are consumed. Denote the bit sequence as B = {b1 , b2 , ...}, all bits
except the first one are hold in B ′ , below definition realizes the decoding algorithm.
{c} : B = ϕ ∧ leaf (T )
{c} ∪ decode(root(T ), B) : leaf (T )
decode(T, B) = (14.90)
decode(Tl , B ′ ) : b1 = 0
decode(Tr , B ′ ) : otherwise
Where root(T ) returns the root of the Huffman tree. The following Haskell example
code implements this algorithm.
decode tr cs = find tr cs where
find (Leaf _ c) [] = [c]
find (Leaf _ c) bs = c : find tr bs
find (Branch _ l r) (b:bs) = find (if b == 0 then l else r) bs
Note that this is an on-line decoding algorithm with linear time performance. It con-
sumes one bit per time. This can be clearly noted from the below imperative realization,
where the index keeps increasing by one.
1: function Decode(T, B)
2: W ←ϕ
3: n ← |B|, i ← 1
4: while i < n do
5: R←T
6: while ¬ Leaf(R) do
7: if B[i] = 0 then
8: R ← Left(R)
9: else
10: R ← Right(R)
11: i←i+1
12: W ← W ∪ Symbol(R)
13: return W
This imperative algorithm can be implemented as the following example C++ pro-
gram.
14.3. SOLUTION SEARCHING 453
Huffman coding, especially the Huffman tree building shows an interesting strategy.
Each time, there are multiple options for merging. Among the trees in the list, Huffman
method always selects two trees with the smallest weight. This is the best choice at that
merge stage. However, these series of local best options generate a global optimal prefix
code.
It’s not always the case that the local optimal choice also leads to the global optimal
solution. In most cases, it doesn’t. Huffman coding is a special one. We call the strategy
that always choosing the local best option as greedy strategy.
Greedy method works for many problems. However, it’s not easy to tell if the greedy
method can be applied to get the global optimal solution. The generic formal proof is
still an active research area. Section 16.4 in [4] provides a good treatment for Matroid
tool, which covers many problems that greedy algorithm can be applied.
Change-making problem
We often change money when visiting other countries. People tend to use credit card
more often nowadays than before, because it’s quite convenient to buy things without
considering much about changes. If we changed some money in the bank, there are often
some foreign money left by the end of the trip. Some people like to change them to coins
for collection. Can we find a solution, which can change the given amount of money with
the least number of coins?
Let’s use USA coin system for example. There are 5 different coins: 1 cent, 5 cent,
25 cent, 50 cent, and 1 dollar. A dollar is equal to 100 cents. Using the greedy method
introduced above, we can always pick the largest coin which is not greater than the
remaining amount of money to be changed. Denote list C = {1, 5, 25, 50, 100}, which
stands for the value of coins. For any given money X, the change coins can be generated
as below.
ϕ : X=0
change(X, C) = otherwise,
{cm } ∪ change(X − cm , C) :
cm = max({c ∈ C, c ≤ X})
(14.91)
If C is in descending order, cm can be found as the first one not greater than X. If we
want to change 1.42 dollar, This function produces a coin list of {100, 25, 5, 5, 5, 1, 1}. The
output coins list can be easily transformed to contain pairs {(100, 1), (25, 1), (5, 3), (1, 2)}.
That we need one dollar, a quarter, three coins of 5 cent, and 2 coins of 1 cent to make
the change. The following Haskell example program outputs result as such.
solve x = assoc ◦ change x where
change 0 _ = []
change x cs = let c = head $ filter ( ≤ x) cs in c : change (x - c) cs
As mentioned above, this program assumes the coins are in descending order, for
instance like below.
454 CHAPTER 14. SEARCHING
For a coin system like USA, the greedy approach can find the optimal solution. The
amount of coins is the minimum. Fortunately, our greedy method works in most countries.
But it is not always true. For example, suppose a country have coins of value 1, 3, and
4 units. The best change for value 6, is to use two coins of 3 units, however, the greedy
method gives a result of three coins: one coin of 4, two coins of 1. Which isn’t the optimal
result.
As shown in the change making problem, greedy method doesn’t always give the best
result. In order to find the optimal solution, we need dynamic programming which will
be introduced in the next section.
However, the result is often good enough in practice. Let’s take the word-wrap problem
for example. In modern software editors and browsers, text spans to multiple lines if the
length of the content is too long to be hold. With word-wrap supported, user needn’t
hard line breaking. Although dynamic programming can wrap with the minimum number
of lines, it’s overkill. On the contrary, greedy algorithm can wrap with lines approximate
to the optimal result with quite effective realization as below. Here it wraps text T , not
to exceeds line width W , with space s between each word.
1: L ← W
2: for w ∈ T do
3: if |w| + s > L then
4: Insert line break
5: L ← W − |w|
6: else
7: L ← L − |w| − s
For each word w in the text, it uses a greedy strategy to put as many words in a line
as possible unless it exceeds the line width. Many word processors use a similar algorithm
to do word-wrapping.
There are many cases, the strict optimal result, but not the approximate one is nec-
essary. Dynamic programming can help to solve such problems.
14.3. SOLUTION SEARCHING 455
Dynamic programming
In the change-making problem, we mentioned the greedy method can’t always give the
optimal solution. For any coin system, are there any way to find the best changes?
Suppose we have find the best solution which makes X value of money. The coins
needed are contained in Cm . We can partition these coins into two collections, C1 and
C2 . They make money of X1 , and X2 respectively. We’ll prove that C1 is the optimal
solution for X1 , and C2 is the optimal solution for X2 .
Proof. For X1 , Suppose there exists another solution C1′ , which uses less coins than C1 .
Then changing solution C1′ ∪ C2 uses less coins to make X than Cm . This is conflict with
the fact that Cm is the optimal solution to X. Similarity, we can prove C2 is the optimal
solution to X2 .
Note that it is not true in the reverse situation. If we arbitrary select a value Y <
X, divide the original problem to find the optimal solutions for sub problems Y and
X − Y . Combine the two optimal solutions doesn’t necessarily yield optimal solution for
X. Consider this example. There are coins with value 1, 2, and 4. The optimal solution
for making value 6, is to use 2 coins of value 2, and 4; However, if we divide 6 = 3 + 3,
since each 3 can be made with optimal solution 3 = 1 + 2, the combined solution contains
4 coins (1 + 1 + 2 + 2).
If an optimal problem can be divided into several sub optimal problems, we call it has
optimal substructure. We see that the change-making problem has optimal substructure.
But the dividing has to be done based on the coins, but not with an arbitrary value.
The optimal substructure can be expressed recursively as the following.
{
ϕ : X=0
change(X) = (14.92)
least({c ∪ change(X − c)|c ∈ C, c ≤ X}) : otherwise
For any coin system C, the changing result for zero is empty; otherwise, we check
every candidate coin c, which is not greater then value X, and recursively find the best
solution for X − c; We pick the coin collection which contains the least coins as the result.
Below Haskell example program implements this top-down recursive solution.
change _ 0 = []
change cs x = minimumBy (compare `on` length)
[c:change cs (x - c) | c ← cs, c ≤ x]
Although this program outputs correct answer [2, 4] when evaluates change [1,
2, 4] 6, it performs very bad when changing 1.42 dollar with USA coins system. It
failed to find the answer within 15 minutes in a computer with 2.7GHz CPU and 8G
memory.
The reason why it’s slow is because there are a lot of duplicated computing in the top-
down recursive solution. When it computes change(142), it needs to examine change(141), change(137),
and change(42). While change(141) next computes to smaller values by deducing with
1, 2, 25, 50 and 100 cents. it will eventually meets value 137, 117, 92, and 42 again. The
search space explodes with power of 5.
This is quite similar to compute Fibonacci numbers in a top-down recursive way.
{
1 : n=1∨n=2
Fn = (14.93)
Fn−1 + Fn−2 : otherwise
F8 = F7 + F6
= F6 + F5 + F5 + F4
= F5 + F4 + F4 + F3 + F4 + F3 + F3 + F2
= ...
We can use the quite similar idea to solve the change making problem. Starts from
zero money, which can be changed by an empty list of coins, we next try to figure out
how to change money of value 1. In US coin system for example, A cent can be used; The
next values of 2, 3, and 4, can be changed by two coins of 1 cent, three coins of 1 cent,
and 4 coins of 1 cent. At this stage, the solution table looks like below
0 1 2 3 4
ϕ {1} {1, 1} {1, 1, 1} {1, 1, 1, 1}
The interesting case happens for changing value 5. There are two options, use another
coin of 1 cent, which need 5 coins in total; The other way is to use 1 coin of 5 cent, which
uses less coins than the former. So the solution table can be extended to this.
0 1 2 3 4 5
ϕ {1} {1, 1} {1, 1, 1} {1, 1, 1, 1} {5}
For the next change value 6, since there are two types of coin, 1 cent and 5 cent, are
less than this value, we need examine both of them.
• If we choose the 1 cent coin, we need next make changes for 5; Since we’ve already
known that the best solution to change 5 is {5}, which only needs a coin of 5 cents,
by looking up the solution table, we have one candidate solution to change 6 as
{5, 1};
• The other option is to choose the 5 cent coin, we need next make changes for 1; By
looking up the solution table we’ve filled so far, the sub optimal solution to change
1 is {1}. Thus we get another candidate solution to change 6 as {1, 5};
It happens that, both options yield a solution of two coins, we can select either of
them as the best solution. Generally speaking, the candidate with fewest number of coins
is selected as the solution, and filled into the table.
At any iteration, when we are trying to change the i < X value of money, we examine
all the types of coin. For any coin c not greater than i, we look up the solution table to
fetch the sub solution T [i − c]. The number of coins in this sub solution plus the one coin
of c are the total coins needed in this candidate solution. The fewest candidate is then
selected and updated to the solution table.
The following algorithm realizes this bottom-up idea.
1: function Change(X)
2: T ← {ϕ, ϕ, ...}
3: for i ← 1 to X do
4: for c ∈ C, c ≤ i do
5: if T [i] = ϕ ∨ 1 + |T [i − c]| < |T [i]| then
6: T [i] ← {c} ∪ T [i − c]
7: return T [X]
This algorithm can be directly translated to imperative programs, like Python for
example.
def changemk(x, cs):
s = [[] for _ in range(x+1)]
for i in range(1, x+1):
for c in cs:
if c ≤ i and (s[i] == [] or 1 + len(s[i-c]) < len(s[i])):
s[i] = [c] + s[i-c]
return s[x]
Observe the solution table, it’s easy to find that, there are many duplicated contents
being stored.
6 7 8 9 10 ...
{1, 5} {1, 1, 5} {1, 1, 1, 5} {1, 1, 1, 1, 5} {5, 5} ...
458 CHAPTER 14. SEARCHING
This is because the optimal sub solutions are completely copied and saved in parent
solution. In order to use less space, we can only record the ‘delta’ part from the sub
optimal solution. In change-making problem, it means that we only need to record the
coin being selected for value i.
1: function Change’(X)
2: T ← {0, ∞, ∞, ...}
3: S ← {N IL, N IL, ...}
4: for i ← 1 to X do
5: for c ∈ C, c ≤ i do
6: if 1 + T [i − c] < T [i] then
7: T [i] ← 1 + T [i − c]
8: S[i] ← c
9: while X > 0 do
10: Print(S[X])
11: X ← X − S[X]
Instead of recording the complete solution list of coins, this new algorithm uses two
tables T and S. T holds the minimum number of coins needed for changing value 0, 1, 2,
...; while S holds the first coin being selected for the optimal solution. For the complete
coin list to change money X, the first coin is thus S[X], the sub optimal solution is to
change money X ′ = X − S[X]. We can look up table S[X ′ ] for the next coin. The coins
for sub optimal solutions are repeatedly looked up like this till the beginning of the table.
Below Python example program implements this algorithm.
def chgmk(x, cs):
cnt = [0] + [x+1] ∗ x
s = [0]
for i in range(1, x+1):
coin = 0
for c in cs:
if c ≤ i and 1 + cnt[i-c] < cnt[i]:
cnt[i] = 1 + cnt[i-c]
coin = c
[Link](coin)
r = []
while x > 0:
[Link](s[x])
x = x - s[x]
return r
This change-making solution loops n times for given money n. It examines at most
the full coin system in each iteration. The time is bound to Θ(nk) where k is the number
of coins for a certain coin system. The last algorithm adds O(n) spaces to record sub
optimal solutions with table T and S.
In purely functional settings, There is no means to mutate the solution table and look
up in constant time. One alternative is to use finger tree as we mentioned in previous
chapter 12 . We can store the minimum number of coins, and the coin leads to the sub
optimal solution in pairs.
The solution table, which is a finger tree, is initialized as T = {(0, 0)}. It means
change 0 money need no coin. We can fold on list {1, 2, ..., X}, start from this table,
with a binary function change(T, i). The folding will build the solution table, and we can
construct the coin list from this table by function make(X, T ).
In function change(T, i), all the coins not greater than i are examined to select the
one lead to the best result. The fewest number of coins, and the coin being selected are
formed to a pair. This pair is inserted to the finger tree, so that a new solution table is
returned.
Again, folding is used to select the candidate with the minimum number of coins. This
folding starts with initial value (∞, 0), on all valid coins. function sel((n, c), c′ ) accepts
two arguments, one is a pair of length and coin, which is the best solution so far; the
other is a candidate coin, it examines if this candidate can make better solution.
{
′ (1 + n′ , c′ ) : 1 + n′ < n, (n′ , c′ ) = T [i − c′ ]
sel((n, c), c ) = (14.96)
(n, c) : otherwise
After the solution table is built, the coins needed can be generated from it.
{
ϕ : X=0
make(X, T ) = (14.97)
{c} ∪ make(X − c, T ) : otherwise, (n, c) = T [X]
The following example Haskell program uses [Link], which is the library
of finger tree, to implement change making solution.
import [Link] (Seq, singleton, index, ( |>))
It’s necessary to memorize the optimal solution to sub problems no matter using the
top-down or the bottom-up approach. This is because a sub problem is used many times
when computing the overall optimal solution. Such properties are called overlapping sub
problems.
• Optimal sub structure. The problem can be broken down into smaller problems,
and the optimal solution can be constructed efficiently from solutions of these sub
problems;
• Overlapping sub problems. The problem can be broken down into sub problems
which are reused several times in finding the overall solution.
The change-making problem, as we’ve explained, has both optimal sub structures, and
overlapping sub problems.
For example, The longest common substring for text “Mississippi”, and “Missunder-
standing” is “Miss”, while the longest common subsequence for them are “Misssi”. This
is shown in figure 14.50.
If we rotate the figure vertically, and consider the two texts as two pieces of source
code, it turns to be a ‘diff’ result between them. Most modern version control tools need
calculate the difference content among the versions. The longest common subsequence
problem plays a very important role.
If either one of the two strings X and Y is empty, the longest common subse-
quence LCS(X, Y ) is definitely empty; Otherwise, denote X = {x1 , x2 , ..., xn }, Y =
{y1 , y2 , ..., ym }, if the first elements x1 and y1 are same, we can recursively find the
longest subsequence of X ′ = {x2 , x3 , ..., xn } and Y ′ = {y2 , y3 , ..., ym }. And the final re-
sult LCS(X, Y ) can be constructed by concatenating x1 with LCS(X ′ , Y ′ ); Otherwise if
x1 6= y1 , we need recursively find the longest common subsequences of LCS(X, Y ′ ) and
LCS(X ′ , Y ), and pick the longer one as the final result. Summarize these cases gives the
below definition.
ϕ : X =ϕ∨Y =ϕ
LCS(X, Y ) = {x1 } ∪ LCS(X ′ , Y ′ ) : x1 = y1 (14.98)
longer(LCS(X, Y ′ ), LCS(X ′ , Y )) : otherwise
Note that this algorithm shows clearly the optimal substructure, that the longest
common subsequence problem can be broken to smaller problems. The sub problem is
ensured to be at least one element shorter than the original one.
It’s also clear that, there are overlapping sub-problems. The longest common subse-
quences to the sub strings are used multiple times in finding the overall optimal solution.
The existence of these two properties, the optimal substructure and the overlapping
sub-problem, indicates the dynamic programming can be used to solve this problem.
A 2-dimension table can be used to record the solutions to the sub-problems. The
rows and columns represent the substrings of X and Y respectively.
14.3. SOLUTION SEARCHING 461
a n t e n n a
1 2 3 4 5 6 7
b 1
a 2
n 3
a 4
n 5
a 6
This table shows an example of finding the longest common subsequence for strings
“antenna” and “banana”. Their lengths are 7, and 6. The right bottom corner of this
table is looked up first, Since it’s empty we need compare the 7th element in “antenna”
and the 6th in “banana”, they are both ‘a’, Thus we need next recursively look up the
cell at row 5, column 6; It’s still empty, and we repeated this till either get a trivial case
that one substring becomes empty, or some cell we are looking up has been filled before.
Similar to the change-making problem, whenever the optimal solution for a sub-problem
is found, it is recorded in the cell for further reusing. Note that this process is in the
reversed order comparing to the recursive equation given above, that we start from the
right most element of each string.
Considering that the longest common subsequence for any empty string is still empty,
we can extended the solution table so that the first row and column hold the empty
strings.
a n t e n n a
ϕ ϕ ϕ ϕ ϕ ϕ ϕ
b ϕ
a ϕ
n ϕ
a ϕ
n ϕ
a ϕ
Below algorithm realizes the top-down recursive dynamic programming solution with
such a table.
1: T ← NIL
2: function LCS(X, Y )
3: m ← |X|, n ← |Y |
4: m′ ← m + 1, n′ ← n + 1
5: if T = NIL then
6: T ← {{ϕ, ϕ, ..., ϕ}, {ϕ, N IL, N IL, ...}, ...} ▷ m′ × n′
7: if X 6= ϕ ∧ Y 6= ϕ ∧ T [m′ ][n′ ] = NIL then
8: if X[m] = Y [n] then
9: T [m′ ][n′ ] ← Append(LCS(X[1..m − 1], Y [1..n − 1]), X[m])
10: else
11: T [m′ ][n′ ] ← Longer(LCS(X, Y [1..n − 1]), LCS(X[1..m − 1], Y ))
12: return T [m′ ][n′ ]
The table is firstly initialized with the first row and column filled with empty strings;
the rest are all NIL values. Unless either string is empty, or the cell content isn’t NIL, the
last two elements of the strings are compared, and recursively computes the longest com-
mon subsequence with substrings. The following Python example program implements
this algorithm.
def lcs(xs, ys):
m = len(xs)
n = len(ys)
462 CHAPTER 14. SEARCHING
global tab
if tab is None:
tab = [[""]∗(n+1)] + [[""] + [None]∗n for _ in xrange(m)]
if m ̸= 0 and n ̸= 0 and tab[m][n] is None:
if xs[-1] == ys[-1]:
tab[m][n] = lcs(xs[:-1], ys[:-1]) + xs[-1]
else:
(a, b) = (lcs(xs, ys[:-1]), lcs(xs[:-1], ys))
tab[m][n] = a if len(b) < len(a) else b
return tab[m][n]
The longest common subsequence can also be found in a bottom-up manner as what
we’ve done with the change-making problem. Besides that, instead of recording the whole
sequences in the table, we can just store the lengths of the longest subsequences, and later
construct the subsubsequence with this table and the two strings. This time, the table is
initialized with all values set as 0.
1: function LCS(X, Y )
2: m ← |X|, n ← |Y |
3: T ← {{0, 0, ...}, {0, 0, ...}, ...} ▷ (m + 1) × (n + 1)
4: for i ← 1 to m do
5: for j ← 1 to n do
6: if X[i] = Y [j] then
7: T [i + 1][j + 1] ← T [i][j] + 1
8: else
9: T [i + 1][j + 1] ← Max(T [i][j + 1], T [i + 1][j])
10: return Get(T, X, Y, m, n)
if xs[i-1] == ys[j-1]:
c[i][j] = c[i-1][j-1] + 1
else:
c[i][j] = max(c[i-1][j], c[i][j-1])
The bottom-up dynamic programming solution can also be defined in purely functional
way. The finger tree can be used as a table. The first row is filled with n + 1 zero values.
This table can be built by folding on sequence X. Then the longest common subsequence
is constructed from the table.
LCS(X, Y ) = construct(f old(f, {{0, 0, ..., 0}}, zip({1, 2, ...}, X))) (14.99)
Note that, since the table need be looked up by index, X is zipped with natural
numbers. Function f creates a new row of this table by folding on sequence Y , and
records the lengths of the longest common sequence for all possible cases so far.
f (T, (i, x)) = insert(T, f old(longest, {0}, zip({1, 2, ...}, Y ))) (14.100)
Function longest takes the intermediate filled row result, and a pair of index and
element in Y , it compares if this element is the same as the one in X. Then fills the new
cell with the length of the longest one.
{
insert(R, 1 + T [i − 1][j − 1]) : x = y
longest(R, (j, y)) = (14.101)
insert(R, max(T [i − 1][j], T [i][j − 1])) : otherwise
After the table is built. The longest common sub sequence can be constructed recur-
←
− ←
−
sively by looking up this table. We can pass the reversed sequences X , and Y together
with their lengths m and n for efficient building.
←− ←−
construct(T ) = get(( X , m), ( Y , n)) (14.102)
If the sequences are not empty, denote the first elements as x and y. The rest elements
←− ←−
are hold in X ′ and Y ′ respectively. The function get can be defined as the following.
←
− ←−
ϕ : X =ϕ∧ Y =ϕ
←− ←−
←− ←− get(( X ′ , i − 1), ( Y ′ , j − 1)) ∪ {x} : x = y
get(( X , i), ( Y , j)) = ←−′ ←
−
get(( X , i − 1), ( Y , j)) : T [i − 1][j] > T [i][j − 1]
←− ←−
get(( X , i), ( Y ′ , j − 1)) : otherwise
(14.103)
Below Haskell example program implements this solution.
lcs' xs ys = construct $ foldl f (singleton $ fromList $ replicate (n+1) 0)
(zip [1..] xs) where
(m, n) = (length xs, length ys)
f tab (i, x) = tab |> (foldl longer (singleton 0) (zip [1..] ys)) where
longer r (j, y) = r |> if x == y
464 CHAPTER 14. SEARCHING
sl sl + 1 ... x1 ... su
x1 F F ... T ... F
With the next element x2 , There are three possible sums. Similar as the first row, {x2 }
sums to x2 ; For all possible sums in previous row, they can also been achieved without
x2 . So the cell below to x1 should also be filled as true; By adding x2 to all possible sums
so far, we can also get some new values. That the cell represents x1 + x2 should be true.
sl sl + 1 ... x1 ... x2 ... x1 + x2 ... su
x1 F F ... T ... F ... F ... F
x2 F F ... T ... T ... T ... F
Generally speaking, when fill the i-th row, all the possible sums constructed with
{x1 , x2 , ..., xi−1 } so far can also be achieved with xi . So the cells previously are true
should also be true in this new row. The cell represents value xi should also be true since
the singleton set {xi } sums to it. And we can also adds xi to all previously constructed
sums to get the new results. Cells represent these new sums should also be filled as true.
When all the elements are processed like this, a table with |X| rows is built. Looking
up the cell represents s in the last row tells if there exists subset can sum to this value.
As mentioned above, there is no solution if s < sl or su < s. We skip handling this case
for the sake of brevity.
1: function Subset-Sum(X, s)
∑
2: sl ← ∑{x ∈ X, x < 0}
3: su ← {x ∈ X, x > 0}
4: n ← |X|
5: T ← {{F alse, F alse, ...}, {F alse, F alse, ...}, ...} ▷ n × (su − sl + 1)
6: for i ← 1 to n do
7: for j ← sl to su do
8: if X[i] = j then
9: T [i][j] ← T rue
10: if i > 1 then
11: T [i][j] ← T [i][j] ∨ T [i − 1][j]
12: j ′ ← j − X[i]
13: if sl ≤ j ′ ≤ su then
14: T [i][j] ← T [i][j] ∨ T [i − 1][j ′ ]
15: return T [n][s]
Note that the index to the columns of the table, doesn’t range from 1 to su − sl + 1,
but maps directly from sl to su . Because most programming environments don’t support
negative index, this can be dealt with T [i][j − sl ]. The following example Python program
utilizes the property of negative indexing.
def solve(xs, s):
low = sum([x for x in xs if x < 0])
up = sum([x for x in xs if x > 0])
tab = [[False]∗(up-low+1) for _ in xs]
for i in xrange(0, len(xs)):
for j in xrange(low, up+1):
tab[i][j] = (xs[i] == j)
j1 = j - xs[i];
tab[i][j] = tab[i][j] or tab[i-1][j] or
(low ≤ j1 and j1 ≤ up and tab[i-1][j1])
return tab[-1][s]
Note that this program doesn’t use different branches for i = 0 and i = 1, 2, ..., n − 1.
This is because when i = 0, the row index to i − 1 = −1 refers to the last row in the
table, which are all false. This simplifies the logic one more step.
With this table built, it’s easy to construct all subsets sum to s. The method is to
466 CHAPTER 14. SEARCHING
look up the last row for cell represents s. If the last element xn = s, then {xn } definitely
is a candidate. We next look up the previous row for s, and recursively construct all the
possible subsets sum to s with {x1 , x2 , x3 , ..., xn−1 }. Finally, we look up the second last
row for cell represents s − xn . And for every subset sums to this value, we add element
xn to construct a new subset, which sums to s.
1: function Get(X, s, T, n)
2: S←ϕ
3: if X[n] = s then
4: S ← S ∪ {X[n]}
5: if n > 1 then
6: if T [n − 1][s] then
7: S ← S∪ Get(X, s, T, n − 1)
8: if T [n − 1][s − X[n]] then
9: S ← S ∪ {{X[n]} ∪ S ′ |S ′ ∈ Get(X, s − X[n], T, n − 1) }
10: return S
The following Python example program translates this algorithm.
def get(xs, s, tab, n):
r = []
if xs[n] == s:
[Link]([xs[n]])
if n > 0:
if tab[n-1][s]:
r = r + get(xs, s, tab, n-1)
if tab[n-1][s - xs[n]]:
r = r + [[xs[n]] + ys for ys in get(xs, s - xs[n], tab, n-1)]
return r
This dynamic programming solution to subset sum problem loops O(n(su − sl + 1))
times to build the table, and recursively uses O(n) time to construct the final solution
from this table. The space it used is also bound to O(n(su − sl + 1)).
Instead of using table with n rows, a vector can be used alternatively. For every
cell represents a possible sum, the list of subsets are stored. This vector is initialized to
contain all empty sets. For every element in X, we update the vector, so that it records
all the possible sums which can be built so far. When all the elements are considered, the
cell corresponding to s contains the final result.
1: function Subset-Sum(X, s)
∑
2: sl ← ∑{x ∈ X, x < 0}
3: su ← {x ∈ X, x > 0}
4: T ← {ϕ, ϕ, ...} ▷ su − sl + 1
5: for x ∈ X do
6: T ′ ← Duplicate(T )
7: for j ← sl to su do
8: j′ ← j − x
9: if x = j then
10: T ′ [j] ← T ′ [j] ∪ {x}
11: if sl ≤ j ′ ≤ su ∧ T [j ′ ] 6= ϕ then
12: T ′ [j] ← T ′ [j] ∪ {{x} ∪ S|S ∈ T [j ′ ]}
13: T ← T′
14: return T [s]
The corresponding Python example program is given as below.
def subsetsum(xs, s):
low = sum([x for x in xs if x < 0])
14.3. SOLUTION SEARCHING 467
This imperative algorithm shows a clear structure, that the solution table is built by
looping every element. This can be realized in purely functional way by folding. A finger
tree can be used to represents the vector spans from sl to su . It is initialized with all
empty values as in the following equation.
After folding, the solution table is built, the answer is looked up at cell s13 .
For every element x ∈ X, function build folds the list {sl , sl + 1, ..., su }, with every
value j, it checks if it equals to x and appends the singleton set {x} to the j-th cell. Not
that here the cell is indexed from sl , but not 0. If the cell corresponding to j − x is not
empty, the candidate solutions stored in that place are also duplicated and add element
x is added to every solution.
Note that the first clause in both equation (14.108) and (14.109) return a new table
with certain cell being updated with the given value.
The following Haskell example program implements this algorithm.
subsetsum xs s = foldl build (fromList [[] | _ ← [l..u]]) xs `idx` s where
l = sum $ filter (< 0) xs
u = sum $ filter (> 0) xs
idx t i = index t (i - l)
build tab x = foldl (λt j → let j' = j - x in
adjustIf (l ≤ j' && j' ≤ u && tab `idx` j' /= [])
(++ [(x:ys) | ys ← tab `idx` j']) j
(adjustIf (x == j) ([x]:) j t)) tab [l..u]
adjustIf pred f i seq = if pred then adjust f (i - l) seq else seq
Some materials like [16] provide common structures to abstract dynamic program-
ming. So that problems can be solved with a generic solution by customizing the precon-
dition, the comparison of candidate solutions for better choice, and the merge method for
sub solutions. However, the variety of problems makes things complex in practice. It’s
important to study the properties of the problem carefully.
13 Again, here we skip the error handling to the case that s < s or s > s . There is no solution if s is
l u
out of range.
468 CHAPTER 14. SEARCHING
Exercise 14.3
• Realize a maze solver by using the stack approach, which can find all the possible
paths.
• There are 92 distinct solutions for the 8 queens puzzle. For any one solution, rotating
it 90◦ , 180◦ , 270◦ gives solutions too. Also flipping it vertically and horizontally also
generate solutions. Some solutions are symmetric, so that rotation or flip gives the
same one. There are 12 unique solutions in this sense. Modify the program to find
the 12 unique solutions. Improve the program, so that the 92 distinct solutions can
be found with fewer search.
• Make the 8 queens puzzle solution generic so that it can solve n queens puzzle.
• Make the functional solution to the leap frogs puzzle generic, so that it can solve n
frogs case.
• Modify the wolf, goat, and cabbage puzzle algorithm, so that it can find all possible
solutions.
• Give the complete algorithm definition to solve the 2 water jugs puzzle with extended
Euclid algorithm.
• We needn’t the exact linear combination information x and y in fact. After we know
the puzzle is solvable by testing with GCD, we can blindly execute the process that:
fill A, pour A into B, whenever B is full, empty it till there is expected volume in
one jug. Realize this solution. Can this one find faster solution than the original
version?
• Compare to the extended Euclid method, the BFS approach is a kind of brute-
force searching. Improve the extended Euclid approach by finding the best linear
combination which minimize |x| + |y|.
• John Horton Conway introduced the sliding tile puzzle. Figure 14.51 shows a sim-
plified verson. There are 8 cells, 7 of them are occupied by pieces labeled from 1 to
7. Each piece can slide to the free cell if they are connected. The line between cells
means there is a connectoin. The goal is to reverse the pieces from 1, 2, 3, 4, 5, 6,
7 to 7, 6, 5, 4, 3, 2, 1 by sliding. Develop a program to solve this puzzle.
1 7
2 6
3 5
• One option to realize the bottom-up solution for the longest common subsequence
problem is to record the direction in the table. Thus, instead of storing the length
information, three values like ’N’, for north, ’W’ for west, and ’NW’ for northwest
are used to indicate how to construct the final result. We start from the bottom-
right corner of the table, if the cell value is ’NW’, we go along the diagonal by
moving to the cell in the upper-left; if it’s ’N’, we move vertically to the upper
row; and move horizontally if it’s ’W’. Implement this approach in your favorite
programming language.
• Given a list of non-negative integers, find the maximum sum composed by numbers
that none of them are adjacent.
• Levenshtein edit distance is defined as the cost of converting from one string s to
another string t. It is widely used in spell-checking, OCR correction etc. There
are three operations allowed in Levenshtein edit distance. Insert a character; delete
a character; and substitute a character. Each operation mutate one character a
time. The following exaple shows how to convert string “kitten” to “sitting”. The
Levenshtein edit distance is 3 in this case.
1. kitten → sitten (substitution of ’s’ for ’k’);
2. sitten → sittin (substitution of ’i’ for ’e’);
3. sitten → sitting (insertion of ’g’ at the end).
Develop a program to calculate Levenshtein edit distance for two strings with Dy-
namic Programming.
[1] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. ISBN:0262032937. The MIT Press. 2001
[3] M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan, ”Time bounds for selec-
tion,” J. Comput. System Sci. 7 (1973) 448-461.
[4] Jon Bentley. “Programming pearls, Second Edition”. Addison-Wesley Professional;
1999. ISBN-13: 978-0201657883
[5] Richard Bird. “Pearls of functional algorithm design”. Chapter 3. Cambridge Uni-
versity Press. 2010. ISBN, 1139490605, 9781139490603
[6] Edsger W. Dijkstra. “The saddleback search”. EWD-934. 1985.
[Link]
[7] Robert Boyer, and Strother Moore. “MJRTY - A Fast Majority Vote Algorithm”.
Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning
Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991, pp. 105-117.
[8] Cormode, Graham; S. Muthukrishnan (2004). “An Improved Data Stream Summary:
The Count-Min Sketch and its Applications”. J. Algorithms 55: 29¨C38.
[9] Knuth Donald, Morris James H., jr, Pratt Vaughan. “Fast pattern matching in
strings”. SIAM Journal on Computing 6 (2): 323¨C350. 1977.
[10] Robert Boyer, Strother Moore. “A Fast String Searching Algorithm”. Comm. ACM
(New York, NY, USA: Association for Computing Machinery) 20 (10): 762¨C772.
1977
[11] R. N. Horspool. “Practical fast searching in strings”. Software - Practice & Experience
10 (6): 501¨C506. 1980.
[12] Wikipedia. “Boyer-Moore string search algorithm”.
[Link]
[13] Wikipedia. “Eight queens puzzle”. [Link]
[14] George Pólya. “How to solve it: A new aspect of mathematical method”. Princeton
University Press(April 25, 2004). ISBN-13: 978-0691119663
[15] Wikipedia. “David A. Huffman”. [Link]
[16] Fethi Rabhi, Guy Lapalme “Algorithms: a functional programming approach”. Sec-
ond edition. Addison-Wesley.
471
472 Red-black tree
Appendix A
We need handle more cases for imperative delete than insert. To resume balance after
cutting off a node fro the red-black tree, we perform rotations and re-coloring. When
delete a black node, rule 5 will be violated because the number of black nodes along the
path through that node reduces by one. We introduce ‘doubly-black’ to maintain the
number of black nodes unchanged. Below example program adds ‘doubly black’ to the
color definition:
data Color {RED, BLACK, DOUBLY_BLACK}
When delete a node, we re-use the binary search tree delete in the first step, then
further fix the balance if the node is black.
1: function Delete(T, x)
2: p ← Parent(x)
3: q ← NIL
4: if Left(x) = NIL then
5: q ← Right(x)
6: Replace(x, Right(x)) ▷ replace x with its right sub-tree
7: else if Right(x) = NIL then
8: q ← Left(x)
9: Replace(x, Leftx()) ▷ replace x with its left sub-tree
10: else
11: y ← Min(Right(x))
12: p ← Parent(y)
13: q ← Right(y)
14: Key(x) ← Key(y)
15: copy data from y to x
16: Replace(y, Right(y)) ▷ replace y with its right sub-tree
17: x←y
18: if Color(x) = BLACK then
19: T ← Delete-Fix(T , Make-Black(p, q), q = NIL?)
20: release x
21: return T
Delete takes the root T and the node x to be deleted as the parameters. x can be
located through lookup. If x has an empty sub-tree, we cut off x, then replace it with
the other sub-tree q. Otherwise, we locate the minimum node y in the right sub-tree of
473
474 APPENDIX A. IMPERATIVE DELETE FOR RED-BLACK TREE
x, then replace x with y. We cut off y after that. If x is black, we call Make-Black(p,
q) to maintain the blackness before further fixing.
1: function Make-Black(p, q)
2: if p = NIL and q = NIL then
3: return NIL ▷ The tree was singleton
4: else if q = NIL then
5: n ← Doubly Black NIL
6: Parent(n) ← p
7: return n
8: else
9: return Blacken(q)
If both p and q are empty, we are deleting the only leaf from a singleton tree. The
result is empty. If the parent p is not empty, but q is, we are deleting a black leaf. We use
NIL to replace that black leaf. As NIL is already black, we change it to ’doubly black’ NIL
to maintain the blackness. Otherwise, if neither p nor q is empty, we call Blacken(q).
If q is red, it changes to black; if q is already black, it changes to doubly black. As the
next step, we need eliminate the doubly blackness through tree rotations and re-coloring.
There are three different cases ([4], pp292). The doubly black node can be NIL or not in
all the cases.
Case 1. The sibling of the doubly black node is black, and it has a red sub-tree. We
can rotate the tree to fix the doubly black. There are 4 sub-cases, all can be transformed
to a uniformed structure as shown in figure A.1.
Figure A.1: The doubly black node has a black sibling, and a red nephew. It can be fixed
with a rotation.
1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then ▷ x is doubly black NIL
4: n←x
5: if x = NIL then ▷ Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do ▷ x is doubly black, but not the root
8: if Sibling(x) 6= NIL then ▷ The sibling is not empty
9: s ← Sibling(x)
475
10: ...
11: if s is black and Left(s) is red then
12: if x = Left(Parent(x)) then ▷ x is the left
13: set x, Parent(x), and Left(s) all black
14: T ← Rotate-Right(T , s)
15: T ← Rotate-Left(T , Parent(x))
16: else ▷ x is the right
17: set x, Parent(x), s, and Left(s) all black
18: T ← Rotate-Right(T , Parent(x))
19: else if s is black and Right(s) is red then
20: if x = Left(Parent(x)) then ▷ x is the left
21: set x, Parent(x), s, and Right(s) all black
22: T ← Rotate-Left(T , Parent(x))
23: else ▷ x is the right
24: set x, Parent(x), and Right(s) all black
25: T ← Rotate-Left(T , s)
26: T ← Rotate-Right(T , Parent(x))
27: ...
Case 2. The sibling of the doubly black is red. We can rotate the tree to change the
doubly black node to black. As shown in figure A.2, change a or c to black. We can add
this fixing to the previous implementation.
1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then ▷ x is doubly black NIL
4: n←x
5: if x = NIL then ▷ Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do
8: if Sibling(x) 6= NIL then
9: s ← Sibling(x)
10: if s is red then ▷ The sibling is red
11: set Parent(x) red
12: set s black
13: if x = Left(Parent(x)) then ▷ x is the left
476 APPENDIX A. IMPERATIVE DELETE FOR RED-BLACK TREE
The sibling of the doubly black isn’t empty in all above 3 cases. Otherwise, we change
the doubly black node back to black, and move the blackness up. When reach the root,
we force the root to be black to complete fixing. It also terminates if the doubly black
node is eliminated after re-color in the midway. At last, if the doubly black node passed
in is empty, we turn it back to normal NIL.
1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then ▷ x is a doubly black NIL
4: n←x
5: if x = NIL then ▷ Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do
8: if Sibling(x) 6= NIL then ▷ The sibling is not empty
9: s ← Sibling(x)
10: if s is red then ▷ The sibling is red
11: set Parent(x) red
12: set s black
13: if x = Left(Parent(x)) then ▷ x is the left
14: T ← Rotate-LeftT , Parent(x)
15: else ▷ x is the right
16: T ← Rotate-RightT , Parent(x)
17: else if s is black and Left(s) is red then
18: if x = Left(Parent(x)) then ▷ x is the left
19: set x, Parent(x), and Left(s) all black
20: T ← Rotate-Right(T , s)
477
if [Link] == null {
db = [Link]
[Link](db)
} else if [Link] == null {
db = [Link]
[Link](db)
} else {
var y = min([Link])
parent = [Link]
db = [Link]
[Link] = [Link]
[Link](db)
x = y
}
if [Link] == [Link] {
t = deleteFix(t, makeBlack(parent, db), db == null);
}
remove(x)
return t
}
478 APPENDIX A. IMPERATIVE DELETE FOR RED-BLACK TREE
Where makeBlack checks if the node changes to doubly black, and handles the special
case of doubly black NIL.
Node makeBlack(Node parent, Node x) {
if parent == null and x == null then return null
return if x == null
then replace(parent, x, Node(0, Color.DOUBLY_BLACK))
else blacken(x)
}
The function blacken(node) changes the red node to black, and the black node to
doubly black:
Node blacken(Node x) {
[Link] = if isRed(x) then [Link] else Color.DOUBLY_BLACK
return x
}
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
t = leftRotate(t, p)
} else {
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
t = leftRotate(t, s)
t = rightRotate(t, p)
}
} else if isBlack(s) and isBlack([Link]) and
isBlack([Link]) {
// the sibling and both sub-trees are black.
// move blackness up
[Link] = [Link]
[Link] = [Link]
blacken(p)
db = p
}
} else { // no sibling, move blackness up
[Link] = [Link]
blacken(p)
db = p
}
}
[Link] = [Link]
if (dbEmpty ̸= null) { // change the doubly black nil to nil
[Link](null)
delete dbEmpty
}
return t
}
Where isBlack(x) tests if a node is black, the NIL node is also black.
Bool isBlack(Node x) = (x == null or [Link] == [Link])
Before returning the final result, we check the doubly black NIL, and call the replaceWith
function defined in Node.
data Node<T> {
//...
void replaceWith(Node y) = replace(parent, this, y)
}
The program terminates when reach the root or the doubly blackness is eliminated.
As we maintain the red-black tree balanced, the delete algorithm is bound to O(lg n) time
for the tree of n nodes.
Exercise A.1
1. Write a program to test if a tree satisfies the 5 red-black tree rules. Use this
program to verify the red-black tree delete implementation.
480 AVL tree - proofs and the delete algorithm
Appendix B
∆H = |T ′ | − |T |
= 1 + max(|r′ |, |l′ |) − (1 + max(|r|, |l|))
= max(|r′ |, |l′ |) − max(|r|, |l|)
δ ≥ 0, δ ′ ≥ 0 : ∆r (B.1)
δ ≤ 0, δ ′ ≥ 0 : δ + ∆r
=
δ ≥ 0, δ ′ ≤ 0 : ∆l − δ
otherwise : ∆l
Proof. When insert, the height can not increase both on left and right. We can explain
the 4 cases from the balance factor definition, which is the difference of the right and left
sub-trees:
1. If δ ≥ 0 and δ ′ ≥ 0, it means the height of the right sub-tree is not less than the
left sub-tree before and after insertion. In this case, the height increment is only
‘contributed’ from the right, which is ∆r.
2. If δ ≤ 0, it means the height of left sub-tree is not less than the right before. Since
δ ′ ≥ 0 after insert, we know the height of right sub-tree increases, and the left side
keeps same (|l′ | = |l|). The height increment is:
481
482 APPENDIX B. AVL TREE - PROOFS AND THE DELETE ALGORITHM
4. Otherwise, δ and δ ′ are not bigger than zero. It means the height of the left sub-tree
is not less than the right. The height increment is only ‘contributed’ from the left,
which is ∆l.
The four cases are left-left, right-right, right-left, and left-right. Let the balance
factors before fixing be δ(x), δ(y), and δ(z), after fixing, they change to δ ′ (x), δ ′ (y), and
δ ′ (z) respectively. We next prove that, δ(y) = 0 for all 4 cases after fixing, and give the
result of δ ′ (x) and δ ′ (z).
After fixing:
Summarize the above, the balance factors change to the following in left-left case:
δ ′ (x) = δ(x)
δ ′ (y) = 0 (B.5)
δ ′ (z) = 0
Right-right
The right-right case is symmetric to left-left:
δ ′ (x) = 0
δ ′ (y) = 0 (B.6)
δ ′ (z) = δ(z)
Right-left
Consider δ ′ (x), after fixing, it is:
If δ(y) 6= 1, then max(|b|, |c|) = |b|. Take this into (B.9) gives:
Summarize the 2 cases, we obtain the result of δ ′ (x) in δ(y) as the following:
{
′ δ(y) = 1 : −1
δ (x) = (B.13)
otherwise : 0
484 APPENDIX B. AVL TREE - PROOFS AND THE DELETE ALGORITHM
All three cases lead to the same result δ ′ (y) = 0. Summarize all above, we get the
updated balance factors after fixing as below:
{
δ(y) = 1 : −1
δ ′ (x) =
otherwise : 0
δ ′ (y) = {
0 (B.17)
δ(y) = −1 : 1
δ ′ (z) =
otherwise : 0
Left-right
Left-right is symmetric to the right-left case. With similar method, we can obtain the
new balance factors that is identical to (B.17).
del ∅ k = (∅,
0)
k < k ′ : tree (del l k) k ′ (r, 0) δ
k > k ′ : tree (l, 0) k ′ (del r k) δ
l = ∅ : (r, −1)
(B.19)
del (l, k ′ , r, δ) =
r = ∅ : (l, −1)
k = k′ :
tree (l, 0) k ′′ (del r k ′′ ) δ
else :
where k ′′ = min(r)
If the tree is empty, the result is (∅, 0); otherwise, let the tree be T = (l, k ′ , r, δ). We
compare the k and k ′ , lookup and delete recursively. When k = k ′ , we locate the node to
be deleted. If it has either empty sub-tree, we cut the node off, and replace it with the
other sub-tree; otherwise, we use the minimum k ′′ in the right sub-tree to replace k ′ , and
cut k ′′ off. We re-use the tree function and ∆H result. Additional to the insert cases,
there are two cases violate AVL rule, and need fixing. As shown in figure B.2, both cases
can be fixed by a tree rotation. We define them as pattern matching:
a b b c
(a) Fix case A
δ(x) = 2 δ(y)′ = δ(y) − 1
x y
′
δ(y) = 0 δ(x) = 1
a y =⇒ x c
b c a b
(b) Fix case B
...
balance ((a, x, b, δ(x)), y, c, −2) ∆H = (a, x, (b, y, c, −1), δ(x) + 1, ∆H)
(B.20)
balance (a, x, (b, y, c, δ(y)), 2) ∆H = ((a, x, b, 1), y, c, δ(y) − 1, ∆H)
...
With the additional two, there are total 7 cases in balance implementation:
balance (Br (Br (Br a x b dx) y c (-1)) z d (-2), dH) =
(Br (Br a x b dx) y (Br c z d 0) 0, dH-1)
balance (Br a x (Br b y (Br c z d dz) 1) 2, dH) =
(Br (Br a x b 0) y (Br c z d dz) 0, dH-1)
balance (Br (Br a x (Br b y c dy) 1) z d (-2), dH) =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1)) 2, dH) =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
−− Delete specific
balance (Br (Br a x b dx) y c (-2), dH) =
(Br a x (Br b y c (-1)) (dx+1), dH)
balance (Br a x (Br b y c dy) 2, dH) =
(Br (Br a x b 1) y c (dy-1), dH)
balance (t, d) = (t, d)
1. |δ(p)| = 0, |δ(p)′ | = 1. After delete, although a sub-tree height decreases, the parent
still satisfies the AVL rule. The algorithm terminates as the tree is still balanced;
2. |δ(p)| = 1, |δ(p)′ | = 0. Before the delete, the height difference between the two
sub-trees is 1; while after delete, the higher sub-tree shrinks by 1. Both sub-trees
have the same height now. As the result, the height of the parent also decrease by
1. We need continue the bottom-up update along the parent reference to the root;
3. |δ(p)| = 1, |δ(p)′ | = 2. After delete, the tree violates the AVL height rule, we need
rotate the tree to fix it.
For case 3, the implementation is similar to the insert fixing. We need add two
additional sub-cases as shown in figure B.2.
1: function AVL-Delete-Fix(T, p, x)
2: while p 6= NIL do
3: l ← Left(p), r ← Right(p)
4: δ ← δ(p), δ ′ ← δ
5: if x = l then
6: δ′ ← δ′ + 1
7: else
8: δ′ ← δ′ − 1
9: if p is leaf then ▷ l = r = NIL
10: δ′ ← 0
11: if |δ| = 1 ∧ |δ ′ | = 0 then
12: x←p
13: p ← Parent(x)
14: else if |δ| = 0 ∧ |δ ′ | = 1 then
15: return T
16: else if |δ| = 1 ∧ |δ ′ | = 2 then
17: if δ ′ = 2 then
18: if δ(r) = 1 then ▷ Right-right
19: δ(p) ← 0
20: δ(r) ← 0
21: p←r
22: T ← Left-Rotate(T, p)
23: else if δ(r) = −1 then ▷ Right-left
24: δy ← δ( Left(r) )
25: if δy = 1 then
26: δ(p) ← −1
27: else
28: δ(p) ← 0
29: δ( Left(r) ) ← 0
30: if δy = −1 then
488 APPENDIX B. AVL TREE - PROOFS AND THE DELETE ALGORITHM
31: δ(r) ← 1
32: else
33: δ(r) ← 0
34: else ▷ Delete specific right-right
35: δ(p) ← 1
36: δ(r) ← δ(r) − 1
37: T ← Left-Rotate(T, p)
38: break ▷ No furthur height change
′
39: else if δ = −2 then
40: if δ(l) = −1 then ▷ Left-left
41: δ(p) ← 0
42: δ(l) ← 0
43: p←l
44: T ← Right-Rotate(T, p)
45: else if δ(l) = 1 then ▷ Left-right
46: δy ← δ( Right(l) )
47: if δy = −1 then
48: δ(p) ← 1
49: else
50: δ(p) ← 0
51: δ( Right(l) ) ← 0
52: if δy = 1 then
53: δ(l) ← −1
54: else
55: δ(l) ← 0
56: else ▷ Delete specific left-left
57: δ(p) ← −1
58: δ(l) ← δ(l) + 1
59: T ← Right-Rotate(T, p)
60: break ▷ No furthur height change
▷ Height decreases, go on bottom-up updating
61: x←p
62: p ← Parent(x)
63: if p = NIL then ▷ Delete the root
64: return x
65: return T
Exercise B.1
1. Compare the imperative tree fixing for insert and delete, there are similarities.
Develop a common fix function for both insert and delete.
t = leftRotate(t, l)
t = rightRotate(t, p)
} else { // delete specific left-left
[Link] = -1
[Link] = [Link] + 1
t = rightRotate(t, p)
break // no further height change
}
}
// height decreases, go on bottom-up update
x = parent
parent = [Link]
}
}
if parent == null then return x // delete the root
return t
}
Bibliography
[1] Richard Bird. “Pearls of functional algorithm design”. Cambridge University Press;
1 edition (November 1, 2010). ISBN-10: 0521513383. pp1 - pp6.
[3] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[10] Miran Lipovaca. “Learn You a Haskell for Great Good! A Beginner’s Guide”. No
Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327-283-8
[12] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
491
492 BIBLIOGRAPHY
[19] Guy Cousinear, Michel Mauny. “The Functional Approach to Programming”. Cam-
bridge University Press; English Ed edition (October 29, 1998). ISBN-13: 978-
0521576819
[21] Chris Okasaki and Andrew Gill. “Fast Mergeable Integer Maps”. Workshop on ML,
September 1998, pages 77-86.
[27] Esko Ukkonen. “On-line construction of suffix trees”. Algorithmica 14 (3): 249–
260. doi:10.1007/BF01206331. [Link]
[Link]
[28] Weiner, P. “Linear pattern matching algorithms”, 14th Annual IEEE Symposium on
Switching and Automata Theory, pp. 1-11, doi:10.1109/SWAT.1973.13
[29] Esko Ukkonen. “Suffix tree and suffix array techniques for pattern analysis in strings”.
[Link]
[31] Robert Giegerich and Stefan Kurtz. “From Ukkonen to McCreight and Weiner:
A Unifying View of Linear-Time Suffix Tree Construction”. Science of Com-
puter Programming 25(2-3):187-218, 1995. [Link]
[Link]
[33] Bryan O’Sullivan. “suffixtree: Efficient, lazy suffix tree implementation”. http://
[Link]/package/suffixtree
[35] Dan Gusfield. “Algorithms on Strings, Trees and Sequences Computer Science and
Computational Biology”. Cambridge University Press; 1 edition (May 28, 1997)
ISBN: 9780521585194
[37] Esko Ukkonen. “Suffix tree and suffix array techniques for pattern analysis in strings”.
[Link]
[38] Esko Ukkonen “Approximate string-matching over suffix trees”. Proc. CPM 93. Lec-
ture Notes in Computer Science 684, pp. 228-242, Springer 1993. [Link]
[Link]/u/ukkonen/[Link]
[44] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented Design Pat-
terns in Java. [Link]
[45] Donald E. Knuth. “The Art of Computer Programming. Volume 3: Sorting and
Searching.”. Addison-Wesley Professional; 2nd Edition (October 15, 1998). ISBN-13:
978-0201485417. Section 5.2.3 and 6.2.3
[47] Sleator, Daniel Dominic; Jarjan, Robert Endre. “Self-adjusting heaps” SIAM Journal
on Computing 15(1):52-69. doi:10.1137/0215004 ISSN 00975397 (1986)
[49] Sleator, Daniel D.; Tarjan, Robert E. (1985), “Self-Adjusting Binary Search Trees”,
Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835
[51] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[58] Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan.
“The Pairing Heap: A New Form of Self-Adjusting Heap” Algorithmica (1986) 1:
111-129.
[59] Maged M. Michael and Michael L. Scott. “Simple, Fast, and Practical Non-Blocking
and Blocking Concurrent Queue Algorithms”. [Link]
research/synchronization/pseudocode/[Link]
[60] Herb Sutter. “Writing a Generalized Concurrent Queue”. Dr. Dobb’s Oct 29, 2008.
[Link]
[63] Harold Abelson, Gerald Jay Sussman, Julie Sussman. “Structure and Interpretation
of Computer Programs, 2nd Edition”. MIT Press, 1996, ISBN 0-262-51087-1.
[65] Ralf Hinze and Ross Paterson. “Finger Trees: A Simple General-purpose Data
Structure.” in Journal of Functional Programming16:2 (2006), pages 197-217. http:
//[Link]/~ross/papers/[Link]
[66] Guibas, L. J., McCreight, E. M., Plass, M. F., Roberts, J. R. (1977), ”A new repre-
sentation for linear lists”. Conference Record of the Ninth Annual ACM Symposium
on Theory of Computing, pp. 49-60.
[70] Jon Bentley, Douglas McIlroy. “Engineering a sort function”. Software Practice and
experience VOL. 23(11), 1249-1265 1993.
[72] Fethi Rabhi, Guy Lapalme. “Algorithms: a functional programming approach”. Sec-
ond edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
BIBLIOGRAPHY 495
[93] Benjamin C. Pierce. “Types and Programming Languages”. The MIT Press, 2002.
ISBN:0262162091
[94] Joe Armstrong. “Programming Erlang: Software for a Concurrent World”. Pragmatic
Bookshelf; 1 edition (July 18, 2007). ISBN-13: 978-1934356005
[95] SGI. “transform”. [Link]
[96] ACM/ICPC. “The drunk jailer.” Peking University judge online for ACM/ICPC.
[Link]
[97] Haskell wiki. “Haskell programming tips”. 4.4 Choose the appropriate fold. http:
//[Link]/haskellwiki/Haskell_programming_tips
[98] Wikipedia. “Dot product”. [Link]
<[Link]
Everyone is permitted to copy and distribute verbatim copies of this license document,
but changing it is not allowed.
Preamble
The purpose of this License is to make a manual, textbook, or other functional and
useful document “free” in the sense of freedom: to assure everyone the effective freedom
to copy and redistribute it, with or without modifying it, either commercially or noncom-
mercially. Secondarily, this License preserves for the author and publisher a way to get
credit for their work, while not being considered responsible for modifications made by
others.
This License is a kind of “copyleft”, which means that derivative works of the document
must themselves be free in the same sense. It complements the GNU General Public
License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, be-
cause free software needs free documentation: a free program should come with manuals
providing the same freedoms that the software does. But this License is not limited to
software manuals; it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book. We recommend this License principally for
works whose purpose is instruction or reference.
497
498 BIBLIOGRAPHY
textbook of mathematics, a Secondary Section may not explain any mathematics.) The
relationship could be a matter of historical connection with the subject or with related
matters, or of legal, commercial, philosophical, ethical or political position regarding
them.
The “Invariant Sections” are certain Secondary Sections whose titles are designated,
as being those of Invariant Sections, in the notice that says that the Document is released
under this License. If a section does not fit the above definition of Secondary then it is
not allowed to be designated as Invariant. The Document may contain zero Invariant
Sections. If the Document does not identify any Invariant Sections then there are none.
The “Cover Texts” are certain short passages of text that are listed, as Front-Cover
Texts or Back-Cover Texts, in the notice that says that the Document is released under
this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may
be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy, represented
in a format whose specification is available to the general public, that is suitable for re-
vising the document straightforwardly with generic text editors or (for images composed
of pixels) generic paint programs or (for drawings) some widely available drawing editor,
and that is suitable for input to text formatters or for automatic translation to a variety of
formats suitable for input to text formatters. A copy made in an otherwise Transparent
file format whose markup, or absence of markup, has been arranged to thwart or dis-
courage subsequent modification by readers is not Transparent. An image format is not
Transparent if used for any substantial amount of text. A copy that is not “Transparent”
is called “Opaque”.
Examples of suitable formats for Transparent copies include plain ASCII without
markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly
available DTD, and standard-conforming simple HTML, PostScript or PDF designed
for human modification. Examples of transparent image formats include PNG, XCF
and JPG. Opaque formats include proprietary formats that can be read and edited only
by proprietary word processors, SGML or XML for which the DTD and/or processing
tools are not generally available, and the machine-generated HTML, PostScript or PDF
produced by some word processors for output purposes only.
The “Title Page” means, for a printed book, the title page itself, plus such following
pages as are needed to hold, legibly, the material this License requires to appear in the
title page. For works in formats which do not have any title page as such, “Title Page”
means the text near the most prominent appearance of the work’s title, preceding the
beginning of the body of the text.
The “publisher” means any person or entity that distributes copies of the Document
to the public.
A section “Entitled XYZ” means a named subunit of the Document whose title
either is precisely XYZ or contains XYZ in parentheses following text that translates
XYZ in another language. (Here XYZ stands for a specific section name mentioned below,
such as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To
“Preserve the Title” of such a section when you modify the Document means that it
remains a section “Entitled XYZ” according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that
this License applies to the Document. These Warranty Disclaimers are considered to be
included by reference in this License, but only as regards disclaiming warranties: any
other implication that these Warranty Disclaimers may have is void and has no effect on
the meaning of this License.
2. VERBATIM COPYING
BIBLIOGRAPHY 499
You may copy and distribute the Document in any medium, either commercially or
noncommercially, provided that this License, the copyright notices, and the license notice
saying this License applies to the Document are reproduced in all copies, and that you
add no other conditions whatsoever to those of this License. You may not use technical
measures to obstruct or control the reading or further copying of the copies you make
or distribute. However, you may accept compensation in exchange for copies. If you
distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may
publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers)
of the Document, numbering more than 100, and the Document’s license notice requires
Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all
these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the
back cover. Both covers must also clearly and legibly identify you as the publisher of
these copies. The front cover must present the full title with all words of the title equally
prominent and visible. You may add other material on the covers in addition. Copying
with changes limited to the covers, as long as they preserve the title of the Document and
satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put
the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest
onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100,
you must either include a machine-readable Transparent copy along with each Opaque
copy, or state in or with each Opaque copy a computer-network location from which
the general network-using public has access to download using public-standard network
protocols a complete Transparent copy of the Document, free of added material. If you use
the latter option, you must take reasonably prudent steps, when you begin distribution of
Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible
at the stated location until at least one year after the last time you distribute an Opaque
copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well
before redistributing any large number of copies, to give them a chance to provide you
with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions
of sections 2 and 3 above, provided that you release the Modified Version under precisely
this License, with the Modified Version filling the role of the Document, thus licensing
distribution and modification of the Modified Version to whoever possesses a copy of it.
In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the
Document, and from those of previous versions (which should, if there were any, be
listed in the History section of the Document). You may use the same title as a
previous version if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for
authorship of the modifications in the Modified Version, together with at least five
of the principal authors of the Document (all of its principal authors, if it has fewer
than five), unless they release you from this requirement.
500 BIBLIOGRAPHY
C. State on the Title page the name of the publisher of the Modified Version, as the
publisher.
E. Add an appropriate copyright notice for your modifications adjacent to the other
copyright notices.
F. Include, immediately after the copyright notices, a license notice giving the public
permission to use the Modified Version under the terms of this License, in the form
shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required Cover
Texts given in the Document’s license notice.
I. Preserve the section Entitled “History”, Preserve its Title, and add to it an item
stating at least the title, year, new authors, and publisher of the Modified Version as
given on the Title Page. If there is no section Entitled “History” in the Document,
create one stating the title, year, authors, and publisher of the Document as given
on its Title Page, then add an item describing the Modified Version as stated in the
previous sentence.
J. Preserve the network location, if any, given in the Document for public access to
a Transparent copy of the Document, and likewise the network locations given in
the Document for previous versions it was based on. These may be placed in the
“History” section. You may omit a network location for a work that was published
at least four years before the Document itself, or if the original publisher of the
version it refers to gives permission.
L. Preserve all the Invariant Sections of the Document, unaltered in their text and in
their titles. Section numbers or the equivalent are not considered part of the section
titles.
M. Delete any section Entitled “Endorsements”. Such a section may not be included in
the Modified Version.
If the Modified Version includes new front-matter sections or appendices that qualify
as Secondary Sections and contain no material copied from the Document, you may at
your option designate some or all of these sections as invariant. To do this, add their
titles to the list of Invariant Sections in the Modified Version’s license notice. These titles
must be distinct from any other section titles.
You may add a section Entitled “Endorsements”, provided it contains nothing but
endorsements of your Modified Version by various parties—for example, statements of
peer review or that the text has been approved by an organization as the authoritative
definition of a standard.
BIBLIOGRAPHY 501
You may add a passage of up to five words as a Front-Cover Text, and a passage of up
to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified
Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added
by (or through arrangements made by) any one entity. If the Document already includes
a cover text for the same cover, previously added by you or by arrangement made by the
same entity you are acting on behalf of, you may not add another; but you may replace
the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission
to use their names for publicity for or to assert or imply endorsement of any Modified
Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License,
under the terms defined in section 4 above for modified versions, provided that you in-
clude in the combination all of the Invariant Sections of all of the original documents,
unmodified, and list them all as Invariant Sections of your combined work in its license
notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical
Invariant Sections may be replaced with a single copy. If there are multiple Invariant
Sections with the same name but different contents, make the title of each such section
unique by adding at the end of it, in parentheses, the name of the original author or
publisher of that section if known, or else a unique number. Make the same adjustment
to the section titles in the list of Invariant Sections in the license notice of the combined
work.
In the combination, you must combine any sections Entitled “History” in the various
original documents, forming one section Entitled “History”; likewise combine any sections
Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You must delete
all sections Entitled “Endorsements”.
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released
under this License, and replace the individual copies of this License in the various docu-
ments with a single copy that is included in the collection, provided that you follow the
rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individ-
ually under this License, provided you insert a copy of this License into the extracted
document, and follow this License in all other respects regarding verbatim copying of
that document.
Cover Texts may be placed on covers that bracket the Document within the aggregate, or
the electronic equivalent of covers if the Document is in electronic form. Otherwise they
must appear on printed covers that bracket the whole aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of
the Document under the terms of section 4. Replacing Invariant Sections with translations
requires special permission from their copyright holders, but you may include translations
of some or all Invariant Sections in addition to the original versions of these Invariant
Sections. You may include a translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include the original
English version of this License and the original versions of those notices and disclaimers.
In case of a disagreement between the translation and the original version of this License
or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled “Acknowledgements”, “Dedications”, or “His-
tory”, the requirement (section 4) to Preserve its Title (section 1) will typically require
changing the actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly
provided under this License. Any attempt otherwise to copy, modify, sublicense, or dis-
tribute it is void, and will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your license from a particular
copyright holder is reinstated (a) provisionally, unless and until the copyright holder
explicitly and finally terminates your license, and (b) permanently, if the copyright holder
fails to notify you of the violation by some reasonable means prior to 60 days after the
cessation.
Moreover, your license from a particular copyright holder is reinstated permanently if
the copyright holder notifies you of the violation by some reasonable means, this is the
first time you have received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after your receipt of the
notice.
Termination of your rights under this section does not terminate the licenses of parties
who have received copies or rights from you under this License. If your rights have been
terminated and not permanently reinstated, receipt of a copy of some or all of the same
material does not give you any rights to use it.
11. RELICENSING
“Massive Multiauthor Collaboration Site” (or “MMC Site”) means any World Wide
Web server that publishes copyrightable works and also provides prominent facilities for
anybody to edit those works. A public wiki that anybody can edit is an example of such a
server. A “Massive Multiauthor Collaboration” (or “MMC”) contained in the site means
any set of copyrightable works thus published on the MMC site.
“CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0 license pub-
lished by Creative Commons Corporation, a not-for-profit corporation with a principal
place of business in San Francisco, California, as well as future copyleft versions of that
license published by that same organization.
“Incorporate” means to publish or republish a Document, in whole or in part, as part
of another Document.
An MMC is “eligible for relicensing” if it is licensed under this License, and if all
works that were first published under this License somewhere other than this MMC, and
subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or
invariant sections, and (2) were thus incorporated prior to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site under
CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is
eligible for relicensing.
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the
“with … Texts.” line with this:
with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover
Texts being LIST, and with the Back-Cover Texts being LIST.
If you have Invariant Sections without Cover Texts, or some other combination of the
three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend re-
leasing these examples in parallel under your choice of free software license, such as the
GNU General Public License, to permit their use in free software.
Index
504
INDEX 505