0% found this document useful (0 votes)

8 views507 pages

Algoxy en

The document titled 'Elementary Algorithms' by Xinyu Liu covers various fundamental algorithms and data structures, including lists and binary search trees. It provides detailed explanations of concepts such as basic operations, transformations, searching, and filtering. The document is structured into sections with a focus on algorithmic improvements and performance considerations.

Uploaded by

Rabbpig Pan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views507 pages

Algoxy en

Uploaded by

Rabbpig Pan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Elementary Algorithms

1
Xinyu LIU

July 22, 2021

1
Xinyu LIU
Version: 0.6180339887498949
Email: liuxinyu95@[Link]
2
Contents

0.1 The smallest free number . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

0.1.1 Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
0.1.2 Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.1.3 Expressiveness and performance . . . . . . . . . . . . . . . . . . . 14
0.2 Regular number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.2.1 The brute-force solution . . . . . . . . . . . . . . . . . . . . . . . . 14
0.2.2 Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.2.3 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
0.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1 List 21
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2.1 Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3 Basic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Last . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3.3 Reverse index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.4 Mutate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Append . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Set value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.5 sum and product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Recursive sum and product . . . . . . . . . . . . . . . . . . . . . . 30
Tail call recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.6 maximum and minimum . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4 Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.4.1 map and for-each . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
For each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.4.2 reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.5 Sub-list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.5.1 take, drop, and split-at . . . . . . . . . . . . . . . . . . . . . . . . 40
conditional take and drop . . . . . . . . . . . . . . . . . . . . . . . 41
1.5.2 break and group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
break and span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6 Fold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3
4 CONTENTS

1.6.1 fold right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

1.6.2 fold left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.6.3 example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.7 Search and filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.7.1 Exist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.7.2 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.7.3 find and filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.7.4 Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.8 zip and unzip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.9 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2 Binary Search Tree 55

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.2 Data Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.4 Traverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.5 Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5.1 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5.2 Minimum and maximum . . . . . . . . . . . . . . . . . . . . . . . . 61
2.5.3 Successor and predecessor . . . . . . . . . . . . . . . . . . . . . . . 61
2.6 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.7 Random build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.8 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.9 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 67

3 Insertion sort 69
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4 List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Binary search tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Red-black tree 75
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.1.2 Tree rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.5 Imperative red-black tree algorithm ⋆ . . . . . . . . . . . . . . . . . . . . 86
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.7 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 88

5 AVL tree 91
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4 Imperative AVL tree algorithm ⋆ . . . . . . . . . . . . . . . . . . . . . . 96
CONTENTS 5

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 98

6 Radix tree 101

6.1 Integer trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.1.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.1.3 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 Integer prefix tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2.3 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3 Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.3 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.4 Prefix tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.4.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.4.3 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.5 Applications of trie and prefix tree . . . . . . . . . . . . . . . . . . . . . . 117
6.5.1 Dictionary and input completion . . . . . . . . . . . . . . . . . . . 117
6.5.2 Predictive text input . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.7 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 122

7 B-Trees 127
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2.1 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Split before insertion . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Insert then fixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.3 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.1 Merge before delete method . . . . . . . . . . . . . . . . . . . . . . 135
7.3.2 Delete and fix method . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.4 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

8 Binary Heaps 153

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.2 Implicit binary heap by array . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.2.2 Heapify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.2.3 Build a heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.2.4 Basic heap operations . . . . . . . . . . . . . . . . . . . . . . . . . 157
Access the top element . . . . . . . . . . . . . . . . . . . . . . . . . 161
Heap Pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Find the top k elements . . . . . . . . . . . . . . . . . . . . . . . . 162
Decrease key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.2.5 Heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.3 Leftist heap and Skew heap, the explicit binary heaps . . . . . . . . . . . 166
6 CONTENTS

8.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Rank (S-value) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Leftist property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.3.2 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Merge operation in implicit binary heap by array . . . . . . . . . . 169
8.3.3 Basic heap operations . . . . . . . . . . . . . . . . . . . . . . . . . 169
Top and pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.3.4 Heap sort by Leftist Heap . . . . . . . . . . . . . . . . . . . . . . . 170
8.3.5 Skew heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Definition of Skew heap . . . . . . . . . . . . . . . . . . . . . . . . 170
Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.4 Splay heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Splaying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Top and pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.4.2 Heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

9 From grape to the world cup, the evolution of selection sort 181
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2 Finding the minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2.1 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2.2 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.2.3 performance of the basic selection sorting . . . . . . . . . . . . . . 186
9.3 Minor Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.3.1 Parameterize the comparator . . . . . . . . . . . . . . . . . . . . . 186
9.3.2 Trivial fine tune . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.3.3 Cock-tail sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.4 Major improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.4.1 Tournament knock out . . . . . . . . . . . . . . . . . . . . . . . . . 192
Refine the tournament knock out . . . . . . . . . . . . . . . . . . . 196
9.4.2 Final improvement by using heap sort . . . . . . . . . . . . . . . . 199
9.5 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

10 Binomial heap, Fibonacci heap, and pairing heap 203

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.2 Binomial Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Binomial tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Binomial heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Data layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
10.2.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . . . . . . 207
Linking trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Insert a new element to the heap (push) . . . . . . . . . . . . . . . 209
Merge two heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
More words about binomial heap . . . . . . . . . . . . . . . . . . . 217
10.3 Fibonacci Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
10.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
10.3.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . . . . . . 218
CONTENTS 7

Insert a new element to the heap . . . . . . . . . . . . . . . . . . . 218

Merge two heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Extract the minimum element from the heap (pop) . . . . . . . . . 220
10.3.3 Running time of pop . . . . . . . . . . . . . . . . . . . . . . . . . . 227
10.3.4 Decreasing key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
10.3.5 The name of Fibonacci Heap . . . . . . . . . . . . . . . . . . . . . 230
10.4 Pairing Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
10.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10.4.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . . . . . . 233
Merge, insert, and find the minimum element (top) . . . . . . . . . 233
Decrease key of a node . . . . . . . . . . . . . . . . . . . . . . . . . 235
Delete the minimum element from the heap (pop) . . . . . . . . . 235
Delete a node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
10.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

11 Queue, not so simple as it was thought 243

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
11.2 Queue by linked-list and circular buffer . . . . . . . . . . . . . . . . . . . 243
11.2.1 Singly linked-list solution . . . . . . . . . . . . . . . . . . . . . . . 243
11.2.2 Circular buffer solution . . . . . . . . . . . . . . . . . . . . . . . . 247
11.3 Purely functional solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
11.3.1 Paired-list queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
11.3.2 Paired-array queue - a symmetric implementation . . . . . . . . . 251
11.4 A small improvement, Balanced Queue . . . . . . . . . . . . . . . . . . . . 253
11.5 One more step improvement, Real-time Queue . . . . . . . . . . . . . . . 254
Incremental reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Incremental concatenate . . . . . . . . . . . . . . . . . . . . . . . . 256
Sum up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
11.6 Lazy real-time queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
11.7 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

12 Sequences, The last brick 267

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.2 Binary random access list . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
12.2.1 Review of plain-array and list . . . . . . . . . . . . . . . . . . . . . 268
12.2.2 Represent sequence by trees . . . . . . . . . . . . . . . . . . . . . . 268
12.2.3 Insertion to the head of the sequence . . . . . . . . . . . . . . . . . 269
Remove the element from the head of the sequence . . . . . . . . . 271
Random access the element in binary random access list . . . . . . 272
12.3 Numeric representation for binary random access list . . . . . . . . . . . . 275
12.3.1 Imperative binary random access list . . . . . . . . . . . . . . . . . 277
12.4 Imperative paired-array list . . . . . . . . . . . . . . . . . . . . . . . . . . 280
12.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
12.4.2 Insertion and appending . . . . . . . . . . . . . . . . . . . . . . . . 281
12.4.3 random access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
12.4.4 removing and balancing . . . . . . . . . . . . . . . . . . . . . . . . 282
12.5 Concatenate-able list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
12.6 Finger tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.6.2 Insert element to the head of sequence . . . . . . . . . . . . . . . . 290
12.6.3 Remove element from the head of sequence . . . . . . . . . . . . . 292
12.6.4 Handling the ill-formed finger tree when removing . . . . . . . . . 293
8 CONTENTS

12.6.5 append element to the tail of the sequence . . . . . . . . . . . . . . 297

12.6.6 remove element from the tail of the sequence . . . . . . . . . . . . 299
12.6.7 concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
12.6.8 Random access of finger tree . . . . . . . . . . . . . . . . . . . . . 304
size augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
Modification due to the augmented size . . . . . . . . . . . . . . . 306
Split a finger tree at a given position . . . . . . . . . . . . . . . . . 308
Random access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Imperative random access . . . . . . . . . . . . . . . . . . . . . . . 310
Imperative splitting . . . . . . . . . . . . . . . . . . . . . . . . . . 312
12.7 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

13 Divide and conquer, Quick sort vs. Merge sort 319

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
13.2 Quick sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
13.2.1 Basic version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
13.2.2 Strict weak ordering . . . . . . . . . . . . . . . . . . . . . . . . . . 321
13.2.3 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
13.2.4 Minor improvement in functional partition . . . . . . . . . . . . . 324
Accumulated partition . . . . . . . . . . . . . . . . . . . . . . . . . 325
Accumulated quick sort . . . . . . . . . . . . . . . . . . . . . . . . 325
13.3 Performance analysis for quick sort . . . . . . . . . . . . . . . . . . . . . . 326
13.3.1 Average case analysis ⋆ . . . . . . . . . . . . . . . . . . . . . . . . 327
13.4 Engineering Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
13.4.1 Engineering solution to duplicated elements . . . . . . . . . . . . . 330
2-way partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
3-way partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
13.5 Engineering solution to the worst case . . . . . . . . . . . . . . . . . . . . 336
13.6 Other engineering practice . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
13.7 Side words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
13.8 Merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
13.8.1 Basic version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Minor improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 345
13.9 In-place merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
13.9.1 Naive in-place merge . . . . . . . . . . . . . . . . . . . . . . . . . . 348
13.9.2 in-place working area . . . . . . . . . . . . . . . . . . . . . . . . . 349
13.9.3 In-place merge sort vs. linked-list merge sort . . . . . . . . . . . . 353
13.10Nature merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
13.11Bottom-up merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
13.12Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
13.13Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

14 Searching 367
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
14.2 Sequence search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
14.2.1 Divide and conquer search . . . . . . . . . . . . . . . . . . . . . . . 367
k-selection problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
2 dimensions search . . . . . . . . . . . . . . . . . . . . . . . . . . 374
Brute-force 2D search . . . . . . . . . . . . . . . . . . . . . 375
CONTENTS 9

Saddleback search . . . . . . . . . . . . . . . . . . . . . . . 375

Improved saddleback search . . . . . . . . . . . . . . . . . . 377
More improvement to saddleback search . . . . . . . . . . . 381
14.2.2 Information reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
Boyer-Moore majority number . . . . . . . . . . . . . . . . . . . . 386
Maximum sum of sub vector . . . . . . . . . . . . . . . . . . . . . 390
KMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Purely functional KMP algorithm . . . . . . . . . . . . . . . 394
Boyer-Moore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
The bad character heuristics . . . . . . . . . . . . . . . . . . 402
The good suﬀix heuristics . . . . . . . . . . . . . . . . . . . 405
14.3 Solution searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
14.3.1 DFS and BFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
Maze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
Eight queens puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . 416
Peg puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Summary of DFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
The wolf, goat, and cabbage puzzle . . . . . . . . . . . . . . . . . . 424
Water jugs puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Kloski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
Summary of BFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
14.3.2 Search the optimal solution . . . . . . . . . . . . . . . . . . . . . . 443
Grady algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Huffman coding . . . . . . . . . . . . . . . . . . . . . . . . . 444
Change-making problem . . . . . . . . . . . . . . . . . . . . 453
Summary of greedy method . . . . . . . . . . . . . . . . . . 454
Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . 455
Properties of dynamic programming . . . . . . . . . . . . . 459
Longest common subsequence problem . . . . . . . . . . . . 459
Subset sum problem . . . . . . . . . . . . . . . . . . . . . . 464
14.4 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

Appendices

A Imperative delete for red-black tree 473

B AVL tree - proofs and the delete algorithm 481

B.1 Height increment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
B.2 Balance adjustment after insert . . . . . . . . . . . . . . . . . . . . . . . . 482
B.3 Delete algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
B.3.1 Functional delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
B.3.2 Imperative delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
B.4 Example program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488

GNU Free Documentation License 497

1. APPLICABILITY AND DEFINITIONS . . . . . . . . . . . . . . . . . . . . 497
2. VERBATIM COPYING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
3. COPYING IN QUANTITY . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
4. MODIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
5. COMBINING DOCUMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . 501
6. COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . . . . . . . . . 501
7. AGGREGATION WITH INDEPENDENT WORKS . . . . . . . . . . . . . 501
10 CONTENTS

8. TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
9. TERMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
10. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . . . . . . . . 502
11. RELICENSING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
ADDENDUM: How to use this License for your documents . . . . . . . . . . . 503
Elementary Algorithms 11

Programmers learn elementary algorithms at school. Except for programming contest,

code interview, they seldom use algorithms in commercial software development. When
talking about algorithms in AI and machine learning, it actually means scientific modeling,
but not about data structure or elementary algorithm. Even when programmers need
them, they have already been provided in libraries. It seems quite enough to know about
how to use the library as a tool but not ‘re-invent the wheel’.
I would say elementary algorithms are critical in solving ‘interesting problems’, the
usefulness of the problem set aside. Let’s start with two problems.

0.1 The smallest free number

Richard Bird gives an interesting programming problem to find the minimum number
that not appears in a given list(Chapter 1, [?]). It’s common to use a number as the
identifier (Id) to index entities. At any time, a number is either occupied or free. When
client tries to acquire a new number as index, we want to always allocate the smallest
available one. Suppose numbers are non-negative integers and those being occupied are
recorded in a list, for example:

[18, 4, 8, 9, 16, 1, 14, 7, 19, 3, 0, 5, 2, 11, 6]

How can we find the smallest free number, which is 10, from the list? It seems quite
easy to figure out the solution.
1: function Min-Free(A)
2: x←0
3: loop
4: if x ∈
/ A then
5: return x
6: else
7: x←x+1
Where the ∈/ is realized like below.
1: function ‘∈’(x,
/ X)
2: for i ← 1 to |X| do
3: if x = X[i] then
4: return False
5: return True
Some environments have built-in implementation to test if an element is in a list.
Below is an example program.

def minfree(lst):
i = 0
while True:
if i not in lst:
return i
i = i + 1

However, when there are millions of numbers being used, this solution performs poor.
The time spent is quadratic to the length of the list. In a computer with 2 cores of
2.10 GHz CPU, and 2G RAM, the C implementation takes 5.4s to search the minimum
free number among 100,000 numbers, and takes more than 8 minutes to handle a million
numbers.
12 Preface

0.1.1 Improvement
The key idea to improve the solution is based on the fact that, for n numbers x1 , x2 , ..., xn ,
if there exists free number, some xi must be outside the range [0, n); otherwise the list is
exactly some permutation of 0, 1, ..., n − 1 hence n should be returned as the minimum
free number. In summary:

minfree(x1 , x2 , ..., xn ) ≤ n (1)

A better solution is to use an array of n + 1 flags to mark whether a number in range

[0, n] is free.
1: function Min-Free(A)
2: F ←[False, False, ..., False] where |F | = n + 1
3: for ∀x ∈ A do
4: if x < n then
5: F [x] ← True
6: for i ← [0, n] do
7: if F [i] = False then
8: return i
Line 2 initializes a flag array all of False values. Then we scan all numbers in A and
mark the corresponding flag to True if the value is less than n. Finally, we iterate to find
the first False flag. This program takes time proportion to n. It uses n + 1 flags to cover
the special case that sorted(A) = [0, 1, 2, ..., n − 1]. This solution is much faster than
the brute force one. In the same computer, the Python implementation takes 0.02s when
dealing with 100,000 numbers.
Although this solution only takes O(n) time, it needs additional O(n) space to store
the flags. We haven’t tuned it yet. Each time the program allocates memory to create
an array of n + 1 flags, then releases it when finish. Such memory allocation and release
is expensive and cost a lot of processing time.
To improve it, we can allocate the memory in advance for later reusing, and change
to bit-wise flags instead of array. For example as the following C program:
#define N 1000000
#define WORD_LENGTH (sizeof(int) * 8)

void setbit(unsigned int* bits, unsigned int i) {

bits[i / WORD_LENGTH] |= 1 << (i % WORD_LENGTH);
}

int testbit(unsigned int* bits, unsigned int i) {

return bits[i / WORD_LENGTH] & (1 << (i % WORD_LENGTH));
}

unsigned int bits[N / WORD_LENGTH + 1];

int minfree(int* xs, int n) {

int i, len = N/WORD_LENGTH + 1;
for (i = 0; i < len; ++i) {
bits[i]=0;
}
for (i=0; i < n; ++i) {
if(xs[i] < n) {
setbit(bits, xs[i]);
}
}
for (i=0; i <= n; ++i) {
if (!testbit(bits, i)) {
return i;
0.1. THE SMALLEST FREE NUMBER 13

}
}
}

This program can handle 1 million numbers in 0.023s in the same computer.

0.1.2 Divide and Conquer

The above improvement costs O(n) additional space for flags, can we eliminate it? The
divide and conquer strategy is to break the problem into smaller ones, then solve them
separately to get the answer.
We can put numbers xi ≤ bn/2c into a sub-list A′ and put the rest into another sub-
list A′′ . According to (1), if the length of A′ equals to bn/2c, it means A′ is ‘full’. The
minimum free number must be in A′′ . We can recursively search in A′′ which is shorter
the original list. Otherwise, it means the minimum free number is in A′ , which again
leads to a smaller problem.
When search in A′′ , the conditions change a bit. We do not start from 0, but from
bn/2c + 1 as the new lower bound. We define the algorithm as search(A, l, u), where l is
the lower bound and u is the upper bound index. For the empty list as a special case, we
return the l as the result.

minf ree(A) = search(A, 0, |A| − 1)

search(∅, l, u) = l{
|A′ | = m − l + 1 : search(A′′ , m + 1, u)
search(A, l, u) =
otherwise : search(A′ , l, m)

where

l+u
m = b c
2
A′ = [x|x ∈ A, x ≤ m]
A′′ = [x|x ∈ A, x > m]

This algorithm doesn’t need additional space1 . Each recursive call performs O(|A|)
comparisons to build A′ and A′′ . After that the problem scale halves. Therefore, the time
is bound to T (n) = T (n/2) + O(n), which reduce to O(n) according to master theorem.
Alternatively, observe that the first call takes O(n) to build A′ and A′′ and the second
call takes O(n/2), and O(n/4) for the third... The total time is O(n + n/2 + n/4 + ...) =
O(2n) = O(n). We use [a|a ∈ A, p(a)] for list. It is different with {a|a ∈ A, p(a)}, which
is a set.
Below example Haskell program implements this algorithm.
minFree xs = bsearch xs 0 (length xs - 1)

bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m+1) u
| otherwise = bsearch as l m
where
m = (l + u) `div` 2
(as, bs) = partition ( ≤ m) xs

1 The recursion takes O(lg n) stack spaces, but it can be eliminated through tail recursion optimization
14 Preface

0.1.3 Expressiveness and performance

One may concern the performance of this divide and conquer algorithm. There are O(lg n)
recursive calls, which need additional stack space. If wanted, we can eliminate the recur-
sion:
1: function Min-Free(A)
2: l ← 0, u ← |A|
3: while u − l > 0 do
u−l
4: m←l+
2
5: lef t ← l
6: for right ← l to u − 1 do
7: if A[right] ≤ m then
8: A[lef t] ↔ A[right]
9: lef t ← lef t + 1
10: if lef t < m + 1 then
11: u ← lef t
12: else
13: l ← lef t
As shown in figure 1, this program re-arranges the array such that all elements before
lef t are less than or equal to m; while those between lef t and right are greater than m.

left right

A[i]<=m A[i]>m ...?...

Figure 1: Divide the array, all A[i] ≤ m where 0 ≤ i < lef t; while all A[i] > m where
lef t ≤ i < right. The rest elements haven’t been processed yet.

This solution is fast and needn’t extra stack space. However, compare to the previous
recursive one, there is some expressiveness drops. Depends on individual taste, one may
prefer one over the other.

0.2 Regular number

The second puzzle is to find the 1,500-th number, which only contains factor 2, 3 or 5. Such
numbers are called regular number, also known as 5-smooth indicating the greatest prime
factor is at most 5, or Hamming numbers named after Richard Hamming in computer
science. 2, 3, and 5 are of course regular numbers. 60 = 22 31 51 is the 25-th number.
21 = 20 31 71 is not valid because it has a factor 7. We consider 1 = 20 30 50 be the 0-th
regular number. The first 10 regular numbers are:
1, 2, 3, 4, 5, 6, 8, 9, 10, 12, ...

0.2.1 The brute-force solution

The straightforward way is to check numbers one by one from 1, extract all factors of 2,
3 and 5 to see if the left part is 1:
1: function Regular-Number(n)
0.2. REGULAR NUMBER 15

2: x←1
3: while n > 0 do
4: x←x+1
5: if Valid?(x) then
6: n←n−1
7: return x

8: function Valid?(x)
9: while x mod 2 = 0 do
10: x ← bx/2c
11: while x mod 3 = 0 do
12: x ← bx/3c
13: while x mod 5 = 0 do
14: x ← bx/5c
15: return x = 1 ?
This ‘brute-force’ algorithm works for small n. However, to find the 1500-th regular
number (which is 860934420), its C implementation takes 40.39s in above computer.
When n increases to 15,000, it can’t terminate after 10 minutes.

0.2.2 Improvement
Modular and divide calculations are very expensive [2]. And they are executed a lot in
loops. Instead of checking if a number only contains 2, 3, or 5 as factors, we can construct
regular number from these three factors. We can start from 1, multiply it with 2, 3, or 5
to generate the rest numbers. The problem turns to be how to generate regular numbers
in order? One method is to utilize the queue data structure.
A queue allows to add element to one end (called enqueue), and delete from the other
(called dequeue). The element enqueued first will be dequeued first. This nature is called
FIFO (First In First Out). The idea is to add 1 as the first number to the queue. We
repeatedly dequeue a number, multiply it with 2, 3, and 5, to generate 3 new numbers;
then add them back to the queue in order. A new generated number may already exist
in the queue. In such case, we drop the duplicated number. Because the new number
may be smaller than the others in the queue, we must put them at the correct position.
Figure 2 shows this idea.

1 2 3 5 3 4 5 6 10

12=2 13=3 15=5 22=4 23=6 25=10 32=6 33=9 3*5=15

(a) Initialize with 1 (b) Add 2, 3, 5 back; (c) Add 4, 6, and 10 back;
4 5 6 9 10 15

42=8 43=12 4*5=20

(d) Add 9 and 15 back, 6 is

dropped.

Figure 2: First 4 steps to generate regular numbers.

16 Preface

We can design the algorithm based on this idea:

1: function Regular-Number(n)
2: Q←∅
3: x←1
4: Enqueue(Q, x)
5: while n > 0 do
6: x ← Dequeue(Q)
7: Unique-Enqueue(Q, 2x)
8: Unique-Enqueue(Q, 3x)
9: Unique-Enqueue(Q, 5x)
10: n←n−1
11: return x

12: function Unique-Enqueue(Q, x)

13: i ← 0, m ← |Q|
14: while i < m and Q[i] < x do
15: i←i+1
16: if i ≥ m or x 6= Q[i] then
17: Insert(Q, i, x)
The insert function takes O(m) time to insert a number at proper position, where
m = |Q| is the length of the queue. It skips insertion if the number already exists. The
length of the queue increases proportion to n (Each time, we dequeue an element, and
enqueue 3 new at most. The increase ratio ≤ 2), the total time is O(1 + 2 + 3 + ... + n) =
O(n2 ).
Figure3 shows the number of access to the queue against n. It is a quadratic curve,
which reflects the O(n2 ) performance.

Figure 3: Queue access count - n.

The corresponding C implementation takes 0.016s to output 860934420. It is about

2500 times faster than the naive search solution.
We can also realize this improvement recursively. Suppose X is an infinite list of all
regular numbers [x1 , x2 , x3 , ...]. For every number, we multiply it by 2, the result is still a
list of regular numbers: [2x1 , 2x2 , 2x3 , ...]. We can also multiply numbers in X by 3 and
5, to generate two new infinite lists. If we merge them together, remove the duplicated
numbers, and prepend 1 as the first, then we get X again. In other words, the following
equation holds:

X = 1 : [2x|∀x ∈ X] ∪ [3x|∀x ∈ X] ∪ [5x|∀x ∈ X] (2)

0.2. REGULAR NUMBER 17

Where symbol x : X means to link x before list X, such that x becomes the first
element. It is called ‘cons’ in Lisp. We link 1 before the rest, as it is the first regular
number. To implement infinite lists merge, we define ∪ to recursively compare elements
in two sorted lists. Let X = [x1 , x2 , x3 ...], Y = [y1 , y2 , y3 , ...] be two such lists, X ′ =
[x2 , x3 , ...] and Y ′ = [y2 , y3 , ...] contain the rest elements except without their heads x1
and y1 . We define merge as below:

 ′
x 1 < y 1 : x 1 : X ∪ Y
X ∪ Y = x1 = y1 : x1 : X ′ ∪ Y ′


y1 < x1 : y1 : X ∪ Y ′

We need not concern about either X or Y is empty, because they are both infinite lists.
In functional settings that support lazy evaluation, this algorithm can be implemented as
the following example program:
ns = 1 : (map (∗2) ns) `merge` (map (∗3) ns) `merge` (map (∗5) ns)

merge (x:xs) (y:ys) | x < y = x : merge xs (y:ys)

| x == y = x : merge xs ys
| otherwise = y : merge (x:xs) ys

The 1500th number 860934420 is given by ns !! 1500. In the same computer, it

takes about 0.03s to output the answer.

0.2.3 Queues
Although the improved solution is much faster than the original brute-force one, it gener-
ates duplicated numbers, and they are eventually dropped. In order to keep numbers
ordered, it needs linear time scan and insertion, which degrades the enqueue opera-
tion from constant time to O(|Q|). To avoid duplication, we can separate all regular
numbers into 3 disjoint buckets: Q2 = {2i |i > 0}, Q23 = {2i 3j |i ≥ 0, j > 0}, and
Q235 = {2i 3j 5k |i, j ≥ 0, k > 0}. The constraints that j 6= 0 in Q23 , and k 6= 0 in
Q235 ensure there is no overlap. The bucket is realized as a queue. They are initialized as
Q2 = {2}, Q23 = {3}, and Q235 = {5}. Starting from 1, each time we extract the smallest
number x from the three queues as the next regular number. Then do the following:

• If x comes from Q2 , we enqueue 2x, 3x, and 5x back to Q2 , Q23 , and Q235 respec-
tively;
• If x comes from Q23 , we only enqueue 3x to Q23 , and 5x to Q235 . We should not
add 2x to Q2 , because Q2 cannot hold any numbers divided by 3.
• If x comes from Q235 , we only need enqueue 5x to Q235 . We should not add 2x to
Q2 , or 3x to Q23 because they can’t hold numbers divided by 5.

We reach to the answer after repeatedly enqueue the smallest number n times. The
following algorithm implements this idea:
1: function Regular-Number(n)
2: x←1
3: Q2 ← {2}, Q23 ← {3}, Q235 ← {5}
4: while n > 0 do
5: x ← min(Head(Q2 ), Head(Q23 ), Head(Q235 ))
6: if x = Head(Q2 ) then
7: Dequeue(Q2 )
8: Enqueue(Q2 , 2x)
18 Preface

2min=4 3min=6 5min=10 3min=9 5*min=15

2 3 5 4 3 6 5 10

min=2 min=3

(a) Enqueue 4, 6, 10; (b) Enqueue 9, 15;

2*min=8 3*min=12 5*min=20 5*min=25

4 6 9 5 10 15 8 6 9 12 5 10 15 20

min=4 min=5

(c) Enqueue 8, 12, 20; (d) Enqueue 25.

Figure 4: First 4 steps with Q2 , Q23 , and Q235 . They were initialize with 2, 3, 5.

9: Enqueue(Q23 , 3x)
10: Enqueue(Q235 , 5x)
11: else if x = Head(Q23 ) then
12: Dequeue(Q23 )
13: Enqueue(Q23 , 3x)
14: Enqueue(Q235 , 5x)
15: else
16: Dequeue(Q235 )
17: Enqueue(Q235 , 5x)
18: n←n−1
19: return x
This algorithm loops on n. Each time it extracts the minimum number from the head
of three queues. This takes constant time. Then it add at most 3 numbers to each queue
respectively. This takes constant time too. Therefore the algorithm is bound to O(n).

0.3 Summary
One might think the brute-force solution was suﬀicient to solve both programming puzzles.
However, as the problem scales up, we have to seek for better solutions. There are
many interesting problems, which were hard before, but through computer programming,
we are able to solve them nowadays. This book aims to provide both functional and
imperative definition for the commonly used elementary algorithms and data structures.
We referenced many results from Okasaki’s work[3] and classic text books(for example
[4]). We try to avoid relying on a specific programming language, because it may or may
not be familiar with the reader, and programming languages keep changing. Instead, we
use pseudo code or mathematics notation to make the algorithm definition generic. When
give code examples, the functional ones look more like Haskell, and the imperative ones
look like a mix of C, Java, and Python. They are only for illustration purpose, but not
guaranteed following any language specification strictly.
Elementary Algorithms 19

Exercise 1
1. For the free number puzzle, since all numbers are not negative, we can leverage the
sign as a flag to indicate a number exists. We can scan the number list, for every
number |x| < n (where n is the length), negate the number at position |x|. Then
we run another round of scan to find out the first positive number. It’s position is
the answer. Write a program to realize this method.
2. There are n numbers 1, 2, ..., n. After some processing, they are shuffled, and a
number x is altered to y. Suppose 1 ≤ y ≤ n, design a solution to find x and y in
linear time with constant space.
3. Below example program is a solution for the regular number puzzle. Is it equivalent
to the queue based solution?
Int regularNum(Int m) {
nums = Int[m + 1]
n = 0, i = 0, j = 0, k = 0
nums[0] = 1
x2 = 2 ∗ nums[i]
x3 = 3 ∗ nums[j]
x5 = 5 ∗ nums[k]
while (n < m) {
n = n + 1
nums[n] = min(x2, x3, x5)
if (x2 == nums[n]) {
i = i + 1
x2 = 2 ∗ nums[i]
}
if (x3 == nums[n]) {
j = j + 1
x3 = 3 ∗ nums[j]
}
if (x5 == nums[n]) {
k = k + 1
x5 = 5 ∗ nums[k]
}
}
return nums[m];
}
20 List
Chapter 1

List

1.1 Introduction
List and array are the preliminary build blocks to create complex data structure. Both
can hold multiple elements as a container. Array is trivially implemented as a range of
consecutive cells indexed by a number. The number is called address or position. Array
is typically bounded. Its size need be determined before using. While list increases on-
demand to hold additional elements. One can traverse a list one by one from head to
tail. Particularly in functional settings, the list related algorithms play critical roles to
control the computation and logic structure1 . Readers already familiar with map, filter,
fold algorithms are safe to skip this chapter, and directly start from chapter 2.

1.2 Definition
List, also known as singly linked-list is a data structure recursively defined as below:

• A list is either empty, denoted as ∅ or NIL;

• Or contains an element and liked with a list.

Figure 1.1 shows a list of nodes. Each node contains two part, an element called key,
and a reference to the sub-list called next. The sub-list reference in the last node is empty,
marked as ‘NIL’.

NIL

Figure 1.1: A list of nodes

Every node links to the next one or NIL. Linked-list is often defined through compound
structure2 , for example:
struct List<A> {
A key
List<A> next
}

1 In low level, lambda calculus plays the most critical role as one of the computation model equivalent

to Turing machine[93], [99].

2 In most cases, the data stored in list have the same type. However, there is also heterogeneous list,

like the list in Lisp for example.

21
22 CHAPTER 1. LIST

It needs more clarification for the empty list. Many traditional environments support
null concept. There are two different ways to represent empty list. One is to use null (or
NIL) directly; the other is to construct a list, but put nothing as []. From implementation
perspective, null need not allocate any memory, while [] does. In this book, we use ∅ to
represent generic empty list, set, or container.

1.2.1 Access
Given a none empty list L, we need define two functions to access its first element, and
the rest sub-list. They are often called f irst(L), rest(L) or head(L), tail(L)3 . On the
other hand, we can construct a list from an element x and another list xs (can be empty),
denoted as x : xs. It is also called the cons operation. We have the following equations
hold:
{
head(x : xs) = x
(1.1)
tail(x : xs) = xs
For a none empty list X, we will also use x1 for the first element, and use X ′ for the
rest sub-list. For example, when X = [x1 , x2 , x3 , ...], then X ′ = [x2 , x3 , ...].

Exercise 1.2
1. For list of type A, suppose we can test if any two elements x, y ∈ A are equal,
define an algorithm to test if two lists are identical.

1.3 Basic operations

From the definition, we can count the length recursively: for empty list, the length is
zero, otherwise, it is the length of the sub-list plus one.
length(∅) = 0
(1.2)
length(L) = 1 + length(L′ )
In order to count the length, this algorithm traverses all the elements from head to end,
hence it is bound to O(n) time, where n is the number of elements. To avoid repeatedly
counting, we can also persist the length in a variable, and update it when mutate (add
or delete) the list. Below is the iterative way to count length:
1: function Length(L)
2: n←0
3: while L 6= NIL do
4: n←n+1
5: L ← Next(L)
6: return n
We will also use notion |L| for the length of list L when the context is clear.

1.3.1 index
Different from array, which supports random access an element at position i in constant
time, we need traverse the list i steps to access the target element.
{
i=0: x
getAt(i, x : xs) = (1.3)
i 6= 0 : getAt(i − 1, xs)
3 They are named as car and cdr in Lisp due to the design of machine registers[63].
1.3. BASIC OPERATIONS 23

In order to get the i-th element from a none empty list:

• if i is 0, the result is the first element;

• Otherwise, the result is the (i − 1)-th element in the sub-list.

We intend to leave the empty list not handled. The behavior when pass ∅ is undefined.
As such, the out of bound case also leads to undefined behavior. If i > |L| exceeds the
length, we end up the edge case to access the (i − |L|)-th element of the empty list. On
the other hand, if i < 0, minus it by one makes it even farther away from 0. We finally
end up with the same situation that the index is negative, while the list is empty.
This algorithm is bound to O(i) time as it advances the list i steps. Below is the
corresponding imperative implementation:
1: function Get-At(i, L)
2: while i 6= 0 do
3: L ← Next(L) ▷ Raise error when L = NIL
4: i←i−1
5: return First(L)

Exercise 1.3
1. In the iterative Get-At(i, L) algorithm, what is the behavior when L is empty?
what is the behavior when i is out of the bound or negative?

1.3.2 Last
There is a pair of symmetric operations to ‘first/rest’. They are called ‘last/init’. For a
none empty list X = [x1 , x2 , ..., xn ], function last returns the last element xn , while init
returns the sub-list of [x1 , x2 , ..., xn−1 ]. Although they are symmetric pairs left to right,
‘last/init’ need linear time, because we need traverse the whole list to tail.
When access the last element of list X:

• If the X contains only one element as [x1 ], then x1 is the last one;

• Otherwise, the result is the last element of the sub-list X ′ .

last([x]) = x
(1.4)
last(x : xs) = last(xs)

Similarly, when extract the sub-list of X contains all elements without the last one:

• If X is a singleton [x1 ], the result is empty [ ];

• Otherwise, we recursively get the initial sub-list for X ′ , then prepend x1 to it as

the result.
init([x]) = [ ]
(1.5)
init(x : xs) = x : init(xs)

We leave the empty list not handled for both operations. The behavior is undefined
if pass ∅ in. Below are the iterative implementation:
1: function Last(L)
2: x ← NIL
3: while L 6= NIL do
4: x ← First(L)
24 CHAPTER 1. LIST

5: L ← Rest(L)
6: return x

7: function Init(L)
8: L′ ← NIL
9: while Rest(L) 6= NIL do ▷ Raise error when L is NIL
10: L′ ← Cons(First(L), L′ )
11: L ← Rest(L)
12: return Reverse(L′ )
As advancing towards the tail, this algorithm accumulates the ‘init’ result through
‘cons’. However, such result is in the reversed order. We need apply reverse (defined in
section 1.4.2) again to return the correct result. There is a question to ask if we can use
‘append’ instead of ‘cons’ in the exercise.

1.3.3 Reverse index

last is a special case of reverse index. The generic case is to find the last i-th element
of a given list. The naive implementation takes two rounds of traverse: Determine the
length n through the first round; then access the (n−i−1)-th element through the second
round:

lastAt(i, L) = getAt(|L| − i − 1, L) (1.6)

There actually exists better solution. The idea is to keep two pointers p1 , p2 with
the distance i between them. The equation resti (p2 ) = p1 holds, where resti (p2 ) means
repleatedly apply rest() function i times. When succeed p2 by i steps gets p1 . We start
by pointing p2 to the list head, and advance both pointers in parallel till p1 arrives at tail.
At that time point, p2 exactly points to the i-th element from right. Figure 1.2 shows
this idea. As p1 , p2 form a window, this method is also called ‘sliding window’ solution.

p2 p1

x[1] x[2] ... x[i+1] ... x[n] .

(a) p2 starts from the head, behind p1 in i steps.

p2 p1

x[1] x[2] ... x[n-i] ... x[n] .

(b) When p1 reaches the tail, p2 points to the i-th element from right.

Figure 1.2: Sliding window formed by two pointers

1: function Last-At(i, L)
1.3. BASIC OPERATIONS 25

2: p←L
3: while i > 0 do
4: L ← Rest(L) ▷ Raise error if out of bound
5: i←i−1
6: while Rest(L) 6= NIL do
7: L ← Rest(L)
8: p ← Rest(p)
9: return First(p)
The functional implementation need special consideration as we cannot update point-
ers directly. Instead, we advance two lists X = [x1 , x2 , ..., xn ] and Y = [xi , xi+1 , ..., xn ]
simultaneously, where Y is the sub-list without the first i − 1 elements.

• If Y is a singleton list, i.e. [xn ], then the last i-th element is the head of X;

• Otherwise, we drop the first element from both X and Y , then recursively check X ′
and Y ′ .

lastAt(i, X) = slide(X, drop(i, X)) (1.7)

where function slide(X, Y ) drops the heads for both lists:

slide(x : xs, [y]) = x

(1.8)
slide(x : xs, y : ys) = slide(xs, ys)

Function drop(m, X) discards the first m elements from list X. It can be implemented
by advancing X by m steps:

drop(0, X) = X
drop(m, ∅) = ∅ (1.9)
drop(m, x : xs) = drop(m − 1, xs)

Exercise 1.4
1. In the Init algorithm, can we use Append(L′ , First(L)) instead of ‘cons’?
2. How to handle empty list or out of bound index error in Last-At algorithm?

1.3.4 Mutate
Mutate operations include append, insert, update, and delete. Some functional environ-
ments actually implement mutate by creating a new list, while the original one is persisted
for later reuse, or released at sometime (chapter 2 in [3]).

Append
Append is the symmetric operation of cons, it adds element on the tail instead of head.
Because of this, it is also called ‘snoc’. For linked-list, it means we need traverse to the
tail, hence it takes O(n) time, where n is the length. To avoid repeatedly traverse, we
can record the tail reference as a variable, and keep updating it upon changes.

append(∅, x) = [x]
(1.10)
append(y : ys, x) = y : append(ys, x)

• If append x to the empty list, the result is [x];

26 CHAPTER 1. LIST

• Otherwise, we firstly recursive append x to the rest sub-list, then prepend the
original head to form the result.

The corresponding iterative implementation is as the following:

1: function Append(L, x)
2: if L = NIL then
3: return Cons(x, NIL)
4: H←L ▷ save the head
5: while Rest(L) 6= NIL do
6: L ← Rest(L)
7: Rest(L) ← Cons(x, NIL)
8: return H
Update the Rest is typically implemented by setting the next reference field as shown
in below example program.
List<A> append(List<A> xs, T x) {
if (xs == null) {
return cons(x, null)
}
List<A> head = xs
while ([Link] ̸= null) {
xs = [Link]
}
[Link] = cons(x, null)
return head
}

Exercise 1.5

1. Add a ‘tail’ field in list definition, optimize the append algorithm to constant time.

2. With the additional ‘tail’ field, when need we update the tail variable? How does
it affect the performance?

Set value

Similar to getAt, we need advance to the target position, then change the element there.
To define function setAt(i, x, L):

• If i = 0, it means we are changing the first element, the result is x : L′ ;

• Otherwise, we need recursively set the value at position i − 1 for the sub-list L′ .

setAt(0, x, y : ys) = x : ys
(1.11)
setAt(i, x, y : ys) = y : setAt(i − 1, x, ys)

This algorithm is bound to O(i) time, where i is the position to update.

Exercise 1.6

1. Handle the empty list and out of bound error for setAt.
1.3. BASIC OPERATIONS 27

insert
There are two different cases about insertion. One is to insert an element at a given
position: insert(i, x, L). The algorithm is similar to setAt; The other is to insert an
element to a sorted list, and keep the order still sorted.
To insert x at position i, we need firstly advance i steps, then construct a new sub-list
with x as the head, then concatenate it to the first i elements4 .

• If i = 0, it then turns to be a ‘cons’ operation: x : L;

• Otherwise, we recursively insert x to L′ at position i − 1; then prepend the original

head.

insert(0, x, L) = x:L
(1.12)
insert(i, x, y : ys) = x : insert(i − 1, x, ys)

When i exceeds the list length, we can treat it as to append x. We leave this as an
exercise. The following is the corresponding iterative implementation:
1: function Insert(i, x, L)
2: if i = 0 then
3: return Cons(x, L)
4: H←L
5: p←L
6: while i > 0 and L 6= NIL do
7: p←L
8: L ← Rest(L)
9: i←i−1
10: Rest(p) ← Cons(x, L)
11: return H
If the list L = [x1 , x2 , ..., xn ] is sorted, i.e. for any position 1 ≤ i ≤ j ≤ n, then
xi ≤ xj holds. Here ≤ is abstract ordering. It can actually mean ≥ for descending order,
or subset relationship etc. We can design the insert algorithm to maintain the sorted
order. To insert element x to a sorted list L:

• If either L is empty or x is not greater than the first element in L, we prepend x to

L and returns x : L;

• Otherwise, we recursively insert x to the sub-list L′ .

insert(x, ∅) = [x]
{
x≤y: x : y : ys (1.13)
insert(x, y : ys) =
otherwise : y : insert(x, ys)

Since the algorithm need compare elements one by one, it is bound to O(n) time,
where n is the length. Below is the corresponding iterative implementation:
1: function Insert(x, L)
2: if L = NIL or x < First(L) then
3: return Cons(x, L)
4: H←L
5: while Rest(L) 6= NIL and First(Rest(L)) < x do
6: L ← Rest(L)
4i starts from 0.
28 CHAPTER 1. LIST

7: Rest(L) ← Cons(x, Rest(L))

8: return H
With this linear time ordered insertion defined, we can further develop the insertion-
sort algorithm. The idea is to repeatedly insert elements to the empty list. Since each
insert takes liner time, the overall sort is bound to O(n2 ).

sort(∅) = ∅
(1.14)
sort(x : xs) = insert(x, sort(xs))

This is a recursive algorithm. It firstly sorts the sub-list, then inserts the first element
in it. We can eliminate the recursion to develop a iterative implementation. The idea is
to scan the list, and one by one insert them:
1: function Sort(L)
2: S ← NIL
3: while L 6= NIL do
4: S ← Insert(First(L), S)
5: L ← Rest(L)
6: return S
At any time during the loop, the result is sorted. There is a major difference between
the recursive and the iterative implementations. The recursive one processes the list
from right, while the iterative one is from left. We’ll introduce ‘tail-recursion’ in section
1.3.5 to eliminate this difference. Chapter 3 introduces insertion sort in detail, including
performance analysis and optimization.

Exercise 1.7
1. Handle the out-of-bound case in insertion, and treat it as append.
2. Design the insertion algorithm for array. When insert at position i, all elements
after i need shift to the end by one.
3. Implement the insertion sort only with less than (<) defined.

delete
Symmetric to insert, delete also has two cases. One is to delete the element at a position;
the other is to look up, then delete the element of a given value. The first case is defined
as delAt(i, L), the second case is defined as delete(x, L).
To delete the element at position i, we need advance i steps to the target position,
then by pass the element, and link the rest sub-list.

• If L is empty, then the result is empty too;

• If i = 0, we are deleting the head, the result is L′ ;

• Otherwise, recursively delete the (i − 1)-th element from L′ , then prepend the orig-
inal head as the result.
delAt(i, ∅) = ∅
delAt(0, x : xs) = xs (1.15)
delAt(i, x : xs) = x : delAt(i − 1, xs)

This algorithm is bound to O(i) as we need advance i steps to perform deleting. Below
is the iterative implementation:
1: function Del-At(i, L)
1.3. BASIC OPERATIONS 29

2: S ← Cons(⊥, L) ▷ A sentinel node

3: p←S
4: while i > 0 and L 6= NIL do
5: i←i−1
6: p←L
7: L ← Rest(L)
8: if L 6= NIL then
9: Rest(p) ← Rest(L)
10: return Rest(S)
To simplify the implementation, we introduce a sentinel node S, it contains a special
value ⊥, and its next reference points to L. With S, we are save to cut-off any node in L
even for the first one. Finally, we return the list after S as the result, and S itself can be
discarded.
For the ‘find and delete’ case, there are two options. We can either find and delete
the first occurrence of a value; or remove all the occurrences. The later is more generic,
we leave it as an exercise. When delete x from list L:
• If the list is empty, the result is ∅;
• Otherwise, we compare the head and x, if they are equal, then the result is L′ ;
• If the head does not equal to x, we keep the head, and recursively delete x in L′ .
delete(x, ∅) = ∅
{
x = y : ys (1.16)
delete(x, y : ys) =
x 6= y : y : delete(x, ys)
This algorithm is bound to O(n) time, where n is the length, as it need scan the list
to find the target element. For the iterative implementation, we also introduce a sentinel
node to simplify the logic:
1: function Delete(x, L)
2: S ← Cons(⊥, L)
3: p←L
4: while L 6= NIL and First(L) 6= x do
5: p←L
6: L ← Rest(L)
7: if L 6= NIL then
8: Rest(p) ← Rest(L)
9: return Rest(S)

Exercise 1.8
1. Design the algorithm to find and delete all occurrences of a given value.
2. Design the delete algorithm for array, all elements after the delete position need
shift to front by one.

concatenate
Append is a special case for concatenation. Append only adds one element, while concate-
nation adds multiple ones. However, the performance would be quadratic if repeatedly
appending as below:
X+ +∅ = X
(1.17)
X+
+ (y : ys) = append(X, y) +
+ ys
30 CHAPTER 1. LIST

In this implementation when concatenate X and Y , each append operation traverses

to the tail, and we do this for |Y | times. the total time is bound to O(|X| + (|X| + 1) +
... + (|X| + |Y |)) = O(|X||Y | + |Y |2 ). Consider the link (cons) operation is fast (constant
time), we can traverse to the tail of X only once, then link Y to the tail.

• If X is empty, the result is Y ;

• Otherwise, we concatenate the sub-list X ′ with Y , then prepend the head as the
result.

We can further improve it a bit: when Y is empty, we needn’t traverse, but directly
return X:
∅+ +Y = Y
X+ +∅ = X (1.18)
(x : xs) +
+Y = x : (xs +
+Y)

The modified algorithm only traverse list X, then link its tail to Y , hence it is bound
O(|X|) time. In imperative settings, concatenation can be realized in constant time with
the additional tail variable. We leave its implementation as exercise. Below is the iterative
implementation without using the tail variable:
1: function Concat(X, Y )
2: if X = NIL then
3: return Y
4: if Y = NIL then
5: return X
6: H←X
7: while Rest(X) 6= NIL do
8: X ← Rest(X)
9: Rest(X) ← Y
10: return H

1.3.5 sum and product

It is common to calculate the sum or product of a list of numbers. They have almost
same structure. We will introduce how to abstract them to higher order computation in
section 1.6.

Recursive sum and product

To calculate the sum of a list:

• If the list is empty, the result is zero;

• Otherwise, the result is the first element plus the sum of the rest.

sum(∅) = 0
(1.19)
sum(x : xs) = x + sum(xs)

We can’t merely replace + to × to obtain product algorithm, because it always returns

zero. We need define the product of the empty list as 1.

product(∅) = 1
(1.20)
product(x : xs) = x · product(xs)

Both algorithms traverse the list, hence are bound to O(n) time, where n is the length.
1.3. BASIC OPERATIONS 31

Tail call recursion

Both sum and product algorithms calculate from right to left. We can change them to
calculate the accumulated result from left to right. For sum, it accumulates from 0, then
adds element one by one; while for product, it starts from 1, then repeatedly multiplying
elements. The accumulate process can be defined as:

• If the list is empty, return the accumulated result;

• Otherwise, accumulate the first element to the result, then go on accumulating.

Below are the accumulated sum and product:

sum′ (A, ∅) = A prod′ (A, ∅) = A

sum′ (A, x : xs) = sum(x + A, xs) prod′ (A, x : xs) = prod′ (x · A, xs)
(1.21)
Given a list, we can call sum′ with 0, and prod′ with 1:

sum(X) = sum′ (0, X) product(X) = prod′ (1, X) (1.22)

Or merely simplify it to Curried form:

sum = sum′ (0) product = prod′ (1)

Curried form was introduced by Schönfinkel (1889 - 1942) in 1924, then widely used by
Haskell Curry from 1958. It is known as Currying[73]. For a function taking 2 parameters
f (x, y), when pass one argument x, it ends up to another function of y: g(y) = f (x, y)
or g = f x. We can further extend it to multiple variables, that f (x, y, ..., z) can be
Curried to a series of functions: f, f x, f x y, .... No matter how many variables, we can
treat them as a series of Curried function, each has only one parameter: f (x, y, ..., z) =
f (x)(y)...(z) = f x y ... z.
The accumulated sum does not only calculate the result from left to right, it needn’t
book keeping any context, state, or intermediate result for recursion. All such states are
either passed as argument (i.e. A), or can be dropped (the previous element in the list).
Such recursive calls are often optimized as pure loops in practice. We call this kind of
function as tail recursion (or ‘tail call’), and the optimization to eliminate recursion is
called ’tail recursion optimization’[61], because the recursion happens at the tail place in
the function. The performance of tail call can be greatly improved after optimization,
and we can avoid the issue of stack overflow in deep recursions.
In section 1.3.4 about insertion sort, we mentioned the recursive algorithm sorts ele-
ments form right. We can also optimize it to tail call:

sort′ (A, ∅) = A
(1.23)
sort′ (A, x : xs) = sort′ (insert(x, A), xs)

And the sort is defined in Curried form with ∅ as the start value:

sort = sort′ (∅) (1.24)

As a typical tail call problem, let’s consider how to compute bn effectively? (refer to
problem 1.16 in [63].) A brute-force solution is to repeatedly multiplying b for n times
from 1. This algorithm is bound to O(n):
1: function Pow(b, n)
2: x←1
3: loop n times
32 CHAPTER 1. LIST

4: x←x·b
5: return x
Actually, the solution can be greatly improved. When compute b8 , after the first 2
loops, we get x = b2 . At this stage, we needn’t multiply x with b to get b3 , but directly
compute x2 , which gives b4 . If do this again, we get (b4 )2 = b8 . Thus we only need loop
3 times, but not 8 times.
Based on this idea, if n = 2m for some none negative integer m, we can design below
algorithm to compute bn :

b1 = b
n
bn = (b 2 )2

We next extend this divide and conquer method for any none negative integer n:

• If n = 0, define b0 = 1;
n
• If n is even, we halve n, to compute b 2 . Then square it;

• Otherwise n is odd. Since n − 1 is even, we recursively compute bn−1 , the multiply

b atop it.

b0 = 1{
n
2|n : (b 2 )2 (1.25)
bn =
otherwise : b · bn−1

However, the 2nd clause blocks us to turn it tail recursive. Alternatively, we can
square the base number, and halve the exponent.

b0 = 1{
n
2|n : (b2 ) 2 (1.26)
bn =
otherwise : b · bn−1

With this change, we can develop a tail recursive algorithm to compute bn = pow(b, n, 1).

pow(b, 0, A) = A
{ n
2|n : pow(b2 ,
, A) (1.27)
pow(b, n, A) = 2
otherwise : pow(b, n − 1, b · A)

Compare to the brute-force implementation, this one improves to O(lg n) time. Ac-
tually, we can improve it further. If represent n in binary format n = (am am−1 ...a1 a0 )2 ,
i
we clearly know that the computation for b2 is necessary if ai = 1. This is quite similar
to the idea of Binomial heap (section 10.2). We can multiplying all of them for bits of 1.
3
For example, when compute b11 , as 11 = (1011)2 = 23 + 2 + 1, thus b11 = b2 × b2 × b.
We get the result by these steps:

1. calculate b1 , which is b;

2. Square to b2 from the previous result;

2
3. Square again to b2 from step 2;
3
4. Square to b2 from step 3.
1.3. BASIC OPERATIONS 33

Finally, we multiply the result of step 1, 2, and 4 to get b11 . Summarize this idea, we
improve the algorithm as below.
pow(b, 0, A) = A

2|n : n
pow(b2 ,
, A) (1.28)
pow(b, n, A) = 2n
otherwise : pow(b2 , b c, b · A)
2
This algorithm essentially shifts n to right 1 bit each time (divide n by 2). If the
LSB (Least Significant Bit, the lowest) is 0, n is even. It squares the base and keeps the
accumulator A unchanged; If the LSB is 1, n is odd. It squares the base and accumulates
it to A; When n is zero, we exhaust all bits, A is the final result. At any time, the
updated base number b′ , the shifted exponent number n′ , and the accumulator A satisfy
′
the invariant bn = A · (b′ )n .
Compare to previous implementation, which minus by one for odd n, this algorithm
halves n every time. It exactly runs m rounds, where m is the number of bits. We leave
the imperative implementation as exercise.
Back to the sum and product. The iterative implementation applies plus and multiply
while traversing:
1: function Sum(L)
2: s←0
3: while L 6= NIL do
4: s ← s+ First(L)
5: L ← Rest(L)
6: return s

7: function Product(L)
8: p←1
9: while L 6= NIL do
10: p ← p · First(L)
11: L ← Rest(L)
12: return p
One interesting usage of product is to calculate factorial of n as: n! = product([1..n]).

1.3.6 maximum and minimum

For a list of comparable elements (we can define order for any two elements), there is
the maximum and minimum. The algorithm structure of max/min is same. For a none
empty list:

• If there is only one element (a singleton) [x1 ], the result is x1 ;

• Otherwise, we recursively find the min/max of the sub-list, then compare it with
the first element to determine the result.
min([x]) = x{
x < min(xs) : x (1.29)
min(x : xs) =
otherwise : min(xs)

and
max([x]) = {
x
x > max(xs) : x (1.30)
max(x : xs) =
otherwise : max(xs)
34 CHAPTER 1. LIST

Both process the list from right to left. We can modify them to tail recursive. It also
brings us the ‘on-line’ feature, that at any time, the accumulator is the min/max so far
processed. Use min for example:

min′ (a, ∅) = a
{
x<a: min′ (x, xs) (1.31)
min′ (a, x : xs) =
otherwise : min′ (a, xs)

Different from sum′ /prod′ , we can’t pass a fixed starting value to the tail recursive
min′ /max′ , unless we use ±∞ in below Curried form:

min = min′ (∞) max = max′ (−∞)

Alternatively, we can pass the first element as the accumulator given min/max only
takes none empty list:

min(x : xs) = min′ (x, xs) max(x : xs) = max′ (x, xs) (1.32)

The optimized tail recursive algorithm can be further changed to purely iterative
implementation. We give the Min example, and skip Max.
1: function Min(L)
2: m ← First(L)
3: L ← Rest(L)
4: while L 6= NIL do
5: if First(L) < m then
6: m ← First(L)
7: L ← Rest(L)
8: return m
There is a way to realize the tail recursive algorithm without using accumulator explic-
itly. The idea is to re-use the first element as the accumulator. Every time, we compare
the head with the next element; then drop the greater one for min, and drop the less one
for max.

min([x]) = x
{
x1 < x 2 : min(x1 : xs) (1.33)
min(x1 : x2 : xs) =
otherwise : min(x2 : xs)

We skip the definition for max as it is symmetric.

Exercise 1.9
1. Change the length to tail call.
2. Change the insertion sort to tail call.
3. Implement the O(lg n) algorithm to calculate bn by represent n in binary.

1.4 Transform
From algebraic perspective, there are two types of transform: one keeps the list structure,
but only change the elements; the other alter the list structure, hence the result is not
isomorphic to the original list. Particularly, we call the former map.
1.4. TRANSFORM 35

1.4.1 map and for-each

The first example is to convert a list of numbers to their represented strings, like to change
[3, 1, 2, 4, 5] to [“three”, “one”, “two”, “four”, “five”]

toStr(∅) = ∅
(1.34)
toStr(x : xs) = str(x) : toStr(xs)

For the second example, consider a dictionary, which is a list of words grouped by
initial letter. Like:

[[a, an, another, ... ],

[bat, bath, bool, bus, ...],
...,
[zero, zoo, ...]]

Next we process a text (Hamlet for example), and augment each word with their
number of occurrence, like:

[[(a, 1041), (an, 432), (another, 802), ... ],

[(bat, 5), (bath, 34), (bool, 11), (bus, 0), ...],
...,
[(zero 12), (zoo, 0), ...]]

Now for every initial letter, we want to figure out which word occurs most. How
to write a program to do this work? The output is a list of words, that every one has
the most occurrences in the group, something like [a, but, can, ...]. We need
develop a program that transform a list of groups of word-number pairs into a list
of words.
First, we need define a function. It takes a list of word-number pairs, finds the word
paired with the biggest number. Sort is overkill. What we need is a special max function
maxBy(cmp, L), where cmp compares two elements abstractly.

maxBy(cmp, [x]) = x
{
cmp(x1 , x2 ) : maxBy(cmp, x2 : xs) (1.35)
maxBy(cmp, x1 : x2 : xs) =
otherwise : maxBy(cmp, x1 : xs)

For a pair p = (a, b) we define two access functions:

{
f st (a, b) = a
(1.36)
snd (a, b) = b

Instead of embedded parenthesis f st((a, b)) = a, we omit one layer, and use a space.
Generally, we treat f x = f (x) when the context is clear. Then we can define a special
compare function for word-count pairs:

less(p1 , p2 ) = snd(p1 ) < snd(p2 ) (1.37)

Then pass less to maxBy to finalize our definition (in Curried form):

max′′ = maxBy(less) (1.38)

With max′′ () defined, we can develop the solution to process the whole list.

solve(∅) = ∅
(1.39)
solve(x : xs) = f st(max′′ (x)) : solve(xs)
36 CHAPTER 1. LIST

Map
The solve() and toStr() functions reveal the same structure, although they are developed
for different problems. We can abstract this common structure as map:

map(f, ∅) = ∅
(1.40)
map(f, x : xs) = f (x) : map(f, xs)

map takes the function f as argument, applies it to every element to form a new list.
A function that computes with other functions is called high-order function. If the type
of f is A → B, which means it sends an element of A to the result of B, then the type of
map is:

map :: (A → B) → [A] → [B] (1.41)

We read it as: map takes a function of A → B, then convert a list [A] to another list
[B]. The two examples in previous section can be defined with map as (in Curried form):

toStr = map str

solve = map (f st ◦ max′′ )

Where f ◦ g means function composite, i.e. first apply g then apply f . (f ◦ g) x =

f (g(x)), Read as f after g. Map can also be defined from the domain theory point of
view. Function y = f (x) defines the map from x in set X to y in set Y :

Y = {f (x)|x ∈ X} (1.42)

This type of set definition is called Zermelo-Frankel set abstraction (known as ZF

expression) [72]. The different is that the mapping is from a list (but not set) to another:
Y = [f (x)|x ∈ Y ]. There can be duplicated elements. For list, such ZF style expression
is called list comprehension.
List comprehension is a powerful tool. As an example, let us see how to realize the
permutation algorithm. Extend from generating all-permutations as [72] and [94], we
define a generic perm(L, r), that permutes r out of the total n elements in the list L.
n!
There are total Pnr = solutions.
(n − r)!
{
|L| < r or r = 0 : [[ ]]
perm(L, r) =
otherwise : [x : ys | x ∈ L, ys ∈ perm(delete(x, L), r − 1)]
(1.43)
If pick zero element for permutation, or there are too few (less than r), the result is a
list of empty list; otherwise, we recursively pick r − 1 out of the rest n − 1 elements; then
prepend x before each. Below Haskell example program utilizes the list comprehension
feature:
perm xs r | r == 0 | | length xs < r = [[]]
| otherwise = [ x:ys | x ←xs,
ys ← perm (delete x xs) (r-1)]

For the iterative Map implementation, below algorithm uses a sentinel node to simplify
the logic to handle head reference.
1: function Map(f, L)
2: L′ ← Cons(⊥, NIL) ▷ Sentinel node
3: p ← L′
4: while L 6= NIL do
1.4. TRANSFORM 37

5: x ← First(L)
6: L ← Rest(L)
7: Rest(p) ← Cons(f (x), NIL)
8: p ← Rest(p)
9: return Rest(L′ ) ▷ Drop the sentinel

For each
Sometimes we only need to traverse the list, repeatedly process the elements one by one
without building the new list. Here is an example that print every element out:
1: function Print(L)
2: while L 6= NIL do
3: print First(L)
4: L ← Rest(L)
More generally, we can pass a procedure P , then traverse the list and apply P to each
element.
1: function For-Each(P, L)
2: while L 6= NIL do
3: P(First(L))
4: L ← Rest(L)

Examples
As an example, let’s see a “n-lights puzzle”[96]. There are n lights in a room, all of them
are off. We execute the following n rounds:
1. Switch all the lights in the room (all on);
2. Switch lights with number 2, 4, 6, ... , that every other light is switched, if the light
is on, it will be off;
3. Switch every third lights, number 3, 6, 9, ... ;
4. ...
And at the last round, only the last light (the n-th light) is switched. The question is
how many lights are on in the end?
Let’s start with a brute-force solution, then improve it step by step. We represent the
state of n lights as a list of 0/1 numbers. 0 is off, 1 is on. The initial state are all zeros:
[0, 0, ..., 0]. We label the light from 1 to n, then map them to (i, on/off) pairs:
lights = map(i 7→ (i, 0), [1, 2, 3, ...n])
It binds each number to zero, the result is a list of pairs: L = [(1, 0), (2, 0), ..., (n, 0)].
Next we operate this list of pairs for n rounds. In the i-th round, switch the second value
in this pair if its label is divided by i. Consider 1 − 0 = 1, and 1 − 1 = 0, we can switch
0/1 value of x by 1 − x. For light (j, x), if i|j, (i.e. j mod i = 0), then switch, otherwise
leave the light untouched.
{
j mod i = 0 : (j, 1 − x)
switch(i, (j, x)) = (1.44)
otherwise : (j, x)

The i-th round for all lights can be realized as map:

map(switch(i), L) (1.45)
38 CHAPTER 1. LIST

Here we use the Curried form of switch, which is equivalent to:

map((j, x) 7→ switch(i, (j, x)), L)
Next, we define a function op(), which performs above mapping on L over and over
by n rounds. We call this function with op([1, 2, ..., n], L).
op(∅, L) = L
(1.46)
op(i : is, L) = op(is, map(switch(i), L))
At this stage, we can sum the second value of each pair in list L to get the answer.
solve(n) = sum(map(snd, op([1, 2, ..., n], lights))) (1.47)
Below is the example Haskell implementation of this brute-force solution:
solve = sum ◦ (map snd) ◦ proc where
lights = map (λi → (i, 0)) [1..n]
proc n = operate [1..n] lights
operate [] xs = xs
operate (i:is) xs = operate is (map (switch i) xs)

switch i (j, x) = if j `mod` i == 0 then (j, 1 - x) else (j, x)

Run this program from 1 light to 100 lights, let’s see what the answers are (we added
line breaks):
[1,1,1,
2,2,2,2,2,
3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,10]
This result is interesting:
• the first 3 answers are 1;
• the 4-th to the 8-th answers are 2;
• the 9-th to the 15-th answers are 3;
• ...
It seems that the i2 -th to the ((i + 1)2 − 1)-th answers are i. Actually, we can prove
it:
Proof. Given n lights labeled from 1 to n, consider which lights are on finally. Since the
initial states for all lights are off, we can say that, the lights which are manipulated odd
times are on. For every light i, it will be switched at the j round if i can be divided by j
(denote as j|i). Only the lights which have odd number of factors are on in the end.
The key point to solve this puzzle, is to find all numbers which have odd number of
factors. For any positive integer n, let S be the set of all factors of n. S is initialized to
∅. If p is a factor of n, there must exist a positive integer q such that n = pq holds. It
means q is also a factor of n. We add 2 different factors to set S if and only if p 6= q,
which keeps |S| even all the time unless p = q. In such case, n is a square number. We
can only add 1 factor to set S, which leads to odd number of factors.
1.4. TRANSFORM 39

At this stage, we can design a fast solution by finding the number of square numbers
under n.
√
solve(n) = b nc (1.48)

Below Haskell example program outputs the answer for 1, 2, ..., 100 lights:
map (floor ◦ sqrt) [1..100]

Map is a generic concept does not limit to list. It can be applied to many complex
algebraic structures. The next chapter about binary search tree explains how to map on
trees. As long as we can traverse the structure, and the empty is defined, we can use the
same mapping idea.

1.4.2 reverse
It’s a classic exercise to reverse a singly linked-list with minimum space. One must
carefully manipulate the node reference, however, there exists easy method to implement
reverse:

1. Write a purely recursive solution;

2. Change it to tail-call;
3. Translate the tail-call solution to imperative operations.

The purely recursive solution is straightforward. To reverse a list L.

• If L is empty, the reversed result is empty;

• Otherwise, recursively reverse sub-list L′ , then append the first element to the end.

reverse(∅) = ∅
(1.49)
reverse(x : xs) = append(reverse(xs), x)

However, the performance is poor. As it need traverse to the end to append, this
algorithm is bound to quadratic time. We can optimize it with tail call, use an ac-
cumulator to store the reversed part so far. We initialize the accumulator as empty:
reverse = reverse′ (∅).

reverse′ (A, ∅) = A
(1.50)
reverse′ (A, x : xs) = reverse′ (x : A, xs)

Different from appending, cons (:) is a constant time operation. The idea is to re-
peatedly take the elements from the head, and prepend them to the accumulator. It
essentially likes to store elements in a stack, then pop them out. The overall performance
is O(n), where n is the length. Since tail call need not keep the context, we can optimize
it to purely iterative loops:
1: function Reverse(L)
2: A ← NIL
3: while L 6= NIL do
4: A ← Cons(First(L), A)
5: L ← Rest(L)
6: return A
However, this algorithm creates a new reversed list, but not mutate the original one.
We need change it to in-place mutate L as the below example program:
40 CHAPTER 1. LIST

List<T> reverse(List<T> xs) {

List<T> p, ys = null
while (xs ̸= null) {
p = xs
xs = [Link]
[Link] = ys
ys = p
}
return ys
}

Exercise 1.10
1. Given a number from 0 to 1 billion, write a program to give its English represen-
tation. For e.g. 123 is ‘one hundred and twenty three’. What if there is decimal
part?
2. Implement the algorithm to find the maximum value in a list of pairs [(k, v)] in
tail call.

1.5 Sub-list
Different from array which is capable to slice a continuous segment fast, it typically need
linear time to traverse and extract sub-list.

1.5.1 take, drop, and split-at

Taking the first n elements is essentially to slice the list from 1 to n: sublist(1, n, L). If
either n = 0 or L = ∅, the sub-list is empty; otherwise, we recursively take the first n − 1
elements from the L′ , then prepend the first element.

take(0, L) = ∅
take(n, ∅) = ∅ (1.51)
take(n, x : xs) = x : take(n − 1, xs)

This algorithm handles the out of bound case like this: if n > |L| or n is negative, it
ends up to the edge case that L becomes empty, hence returns the whole list as the result.
Drop, on the other hand, discards the first n elements and returns the rest. It is
equivalent to slice the sub-list from right: sublist(n + 1, |L|, L), where |L| is the length.
Its implementation is symmetric:

drop(0, L) = L
drop(n, ∅) = ∅ (1.52)
drop(n, x : xs) = drop(n − 1, xs)

We leave the imperative implementation for take/drop as exercise. As the next step,
we can develop a algorithm to extract sub-list at any position for a given length:

sublist(f rom, cnt, L) = take(cnt, drop(f rom − 1, L)) (1.53)

Or slice the list with left and right boundaries:

slice(f rom, to, L) = drop(f rom − 1, take(to, L)) (1.54)

1.5. SUB-LIST 41

The boundary is defined as [f rom, to]. It includes both ends. We can also split a list
at a given position:

splitAt(i, L) = (take(i, L), drop(i, L)) (1.55)

Exercise 1.11
1. Define sublist and slice in Curried Form without L as parameter.

conditional take and drop

Instead of specifying number of elements for take/drop, one may want to provide a pred-
ication. We keep taking or dropping as far as the condition meets. We define such
algorithm as takeW hile/dropW hile.
takeW hile/dropW hile examine elements one by one against the prediction. They
ignore the rest even if some elements satisfy the condition. We’ll see this different in the
section of filtering.
takeW hile(p, ∅) = ∅
{
p(x) : x : takeW hile(p, xs) (1.56)
takeW hile(p, x : xs) =
otherwise : ∅

Where p is the prediction. When applied to an element, p returns true or false to

indicate the condition is satisfied. dropW hile is symmetric:
dropW hile(p, ∅) = ∅
{
p(x) : dropW hile(p, xs) (1.57)
dropW hile(p, x : xs) =
otherwise : x : xs

1.5.2 break and group

Break and group are operations to re-arrange a list into multiple sub-lists. They typically
perform the re-arrangement while traverse the list to keep the performance linear.

break and span

break/span can be considered as a general form of splitting. Instead of splitting at a given
position, break/span scans elements with a prediction. It extracts the longest prefix of
the list against the condition, and returns it together with the rest as a pair.
There are two different cases. For a given predication, one is to pick the elements
satisfied; the other is to pick the elements not satisfied. The former is called span, the
later is called break.
span(p, ∅) = { ∅)
(∅,
p(x) : (x : A, B) where (A, B) = span(p, xs) (1.58)
span(p, x : xs) =
otherwise : (∅, x : xs)

and we can define break with span by negating the predication in Curried form:

break(p) = span(¬p) (1.59)

Both span and break find the longest prefix. They stop immediately when the con-
dition does not meet and ignores the rest. Below is the iterative implementation for
span:
42 CHAPTER 1. LIST

1: function Span(p, L)
2: A ← NIL
3: while L 6= NIL and p(First(L)) do
4: A ← Cons(First(L), A)
5: L ← Rest(L)
6: return (A, L)
This algorithm creates a new list to hold the longest prefix, another option is to reuse
the original list and break it in-place:
1: function Span(p, L)
2: A←L
3: tail ← NIL
4: while L 6= NIL and p(First(L)) do
5: tail ← L
6: L ← Rest(L)
7: if tail = NIL then
8: return (NIL, L)
9: Rest(tail) ← NIL
10: return (A, L)

group
span breaks list into two parts, group divides list into multiple sub-lists. For example,
we can use group to change a long word into small units, each contains consecutive same
characters:
group ``Mississippi'' = [``M'', `ì'', ``ss'', `ì'',
``ss'',`ì'', ``pp'', `ì'']

For another example, given a list of numbers:

L = [15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1, 4, 8, 3, 14, 2]
We can divide it into small lists, each one is in descending order:
group(L) = [[15, 9, 0], [12, 11, 7], [10, 5], [6], [13, 1], [4], [8, 3], [14, 2]]
These are useful operations. The string groups can be used to build Radix tree, a
data structure support fast text search. The number groups can be used to implement
nature merge sort algorithm. We’ll introduce them in later chapters.
We can abstract the group condition as a relation ∼. It tests whether two consecutive
elements x, y are generic ‘equivalent’: x ∼ y. We scan and list and compare two elements
each time. If they match, we add both to a group; otherwise, only add x to the group,
and use y to initialize another group.
group(∼, ∅) = [∅]
group(∼, [x]) = { [[x]]
(1.60)
x∼y: (x : ys) : yss
group(∼, x : y : xs) =
otherwise : [x] : ys : yss

where (ys : yss) = group(∼, xs). This algorithm is bound to O(n) time, where n is
the length. We can also implement the iterative group algorithm. For the none empty
list L, we initialize the result groups as [[x1 ]], where x1 is the first element. We scan the
list from the second one, append it to the last group if the two consecutive elements are
‘equivalent’; otherwise we start a new group.
1.5. SUB-LIST 43

1: function Group(∼, L)
2: if L = NIL then
3: return [NIL]
4: x ← First(L)
5: L ← Rest(L)
6: g ← [x]
7: G ← [g]
8: while L 6= NIL do
9: y ← First(L)
10: if x ∼ y then
11: g ← Append(g, y)
12: else
13: g ← [y]
14: G ← Append(G, g)
15: x←y
16: L ← Next(L)
17: return G
However, this program performs in quadratic time if the append isn’t optimized with
the tail reference. If don’t care the order, we can alternatively change append to cons.
With the group algorithm defined, we can realize the above 2 cases as below:

group(=, [m, i, s, s, i, s, s, i, p, p, i]) = [[M ], [i], [ss], [i], [ss], [i], [pp], [i]]

and
group(≥, [15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1, 4, 8, 3, 14, 2])
= [[15, 9, 0], [12, 11, 7], [10, 5], [6], [13, 1], [4], [8, 3], [14, 2]]
Another method to implement group is to use the span function. Given a predication,
span breaks the list into two parts: the longest sub-list satisfies the condition, and the
rest. We can repeatedly apply span to the rest part till it becomes empty. However, the
predication passed to span is an unary function. It takes an element and tests it. While
in group, the predication is a binary function. It takes two elements and compares. We
can use Currying: to pass and fix the first element in the binary predication, then use the
Curried function to test the other.
group(∼, ∅) = [∅]
(1.61)
group(∼, x : xs) = (x : A) : group(∼, B)

Where (A, B) = span(y 7→ x ∼ y, xs) is the span result applied to the rest sub-list.
Although this new group function generates the correct result for string case:
group (==) ``Mississippi''
[``m'', `ì'', ``ss'', `ì'', ``ss'', `ì'', ``pp'', `ì'']

However, it can’t group the list of numbers correctly with ≤ relation:

group ( ≥ ) [15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1, 4, 8, 3, 14, 2]
[[15,9,0,12,11,7,10,5,6,13,1,4,8,3,14,2]]

When the first number 15 is used as the left hand of ≥, it is the maximum value, hence
span ends with putting all elements to A, and leaves B empty. It is not a defect, but
the correct behavior, because group is defined to put equivalent elements together. To be
accurate, the equivalent relation (∼) needs satisfy three things: reflexive, transitive, and
symmetric.

1. Reflexive. x ∼ x, any element equals to itself;

44 CHAPTER 1. LIST

2. Transitive. x ∼ y, y ∼ z ⇒ x ∼ z, if two elements equal, and one of them equals

to another, then all three equal;

3. Symmetric. x ∼ y ⇔ y ∼ x, the order of comparing two equal elements doesn’t

affect the result.

When group “Mississippi”, we use the equal (=) operator. It conforms the three
rules, and generates the correct result. However, when pass Curried (≥) predication for
numbers, it violets both reflexive and symmetric rules, hence generates unexpected result.
The second algorithm using span, limits its use case to strictly equality; while the first
algorithm does not. It only tests the predication for every two elements matches, which
is weaker than equality relation.

Exercise 1.12
1. Change the take/drop algorithm, such that when n is negative, returns ∅ for take,
and the whole list for drop.
2. Implement the in-place imperative take/drop algorithms.
3. Implement the iterative ‘take while’ and ‘drop while’ algorithms.
4. Consider the below span implementation:

span(p, ∅) = { ∅)
(∅,
p(x) : (x : A, B)
span(p, x : xs) =
otherwise : (A, x : B)

where (A, B) = span(p, xs). What is the difference between this one and the
algorithm we defined previously?

1.6 Fold
We’ve seen most list algorithms share some common structure. This is not by chance.
Such commonality is rooted from the recursive nature of list. We can abstract the list
algorithms to a higher level concept, fold5 , which is essentially the initial algebra of all
list related computation[99].

1.6.1 fold right

Compare sum, product and sort, we can find the common structure.

h(∅) = z
(1.62)
h(x : xs) = x ⊕ h(xs)

There are two things we can abstract as parameters:

• The result for empty list. It is 0 for sum, 1 for product, and ∅ for sort.

• The binary operation applies to the head and the recursive result. It is plus for
sum, multiply for product, and ordered-insertion for sort.
5 also known as reduce
1.6. FOLD 45

We abstract the result for empty list as the initial value, denoted as z to mimic the
generic zero concept. The binary operation as ⊕. The above definition can be then
parameterized as:
h(⊕, z, ∅) = z
(1.63)
h(⊕, z, x : xs) = x ⊕ h(⊕, z, xs)
Let’s feed it a list L = [x1 , x2 , ..., xn ], and expand to see how it behaves like:
h(⊕, z, [x1 , x2 , ..., xn ])
= x1 ⊕ h(⊕, z, [x2 , x3 , ..., xn ])
= x1 ⊕ (x2 ⊕ h(⊕, z, [x3 , ..., xn ]))
...
= x1 ⊕ (x2 ⊕ (...(xn ⊕ h(⊕, z, ∅))...))
= x1 ⊕ (x2 ⊕ (...(xn ⊕ z)...))
We need add the parentheses, because the computation starts from the right-most
(xn ⊕ z). It repeatedly folds to left towards x1 . This is quite similar to a fold-fan in figure
1.3. Fold-fan is made of bamboo and paper. Multiple frames stack together with an axis
at one end. The arc shape paper is fully expanded by these frames; We can close the fan
by folding the paper. It ends up as a stick.

Figure 1.3: Fold fan

We can consider the fold-fan as a list of bamboo frames. The binary operation is to
fold a frame to the top of the stack. The initial stack is empty. To fold the fan, we start
from one end, repeatedly apply the binary operation, till all the frames are stacked. The
sum and product algorithms do the same thing like folding fan.
sum([1, 2, 3, 4, 5]) = 1 + (2 + (3 + (4 + 5)))
= 1 + (2 + (3 + 9))
= 1 + (2 + 12)
= 1 + 14
= 15
product([1, 2, 3, 4, 5]) = 1 × (2 × (3 × (4 × 5)))
= 1 × (2 × (3 × 20))
= 1 × (2 × 60)
= 1 × 120
= 120
We name this kind of process fold. Particularly, since the computation starts from
the right end, we denote it f oldr:
f oldr(f, z, ∅) = z
(1.64)
f oldr(f, z, x : xs) = f (x, f oldr(f, z, xs))
46 CHAPTER 1. LIST

We can define sum and product with f oldr as below:

∑n
i=1 xi = x1 + (x2 + (x3 + ... + (xn−1 + xn ))...) (1.65)
= f oldr(+, 0, [x1 , x2 , ..., xn ])
∏n
xi = x1 × (x2 × (x3 × ... + (xn−1 × xn ))...)
i=1 (1.66)
= f oldr(×, 1, [x1 , x2 , ..., xn ])

Or in Curried form: sum = f oldr(+, 0), product = f oldr(×, 1). We can also define
the insertion sort with f oldr as:

sort = f oldr(insert, ∅) (1.67)

1.6.2 fold left

We can convert f oldr to tail call. It generates the same result, but computes from left to
right. For this reason, we define it as f oldl:

f oldl(f, z, ∅) = z
(1.68)
f oldl(f, z, x : xs) = f oldl(f, f (z, x), xs)

Use sum for example, we can see how the computation is expanded from left to right:

f oldl(+, 0, [1, 2, 3, 4, 5])

= f oldl(+, 0 + 1, [2, 3, 4, 5])
= f oldl(+, (0 + 1) + 2, [3, 4, 5])
= f oldl(+, ((0 + 1) + 2) + 3, [4, 5])
= f oldl(+, (((0 + 1) + 2) + 3) + 4, [5])
= f oldl(+, ((((0 + 1) + 2 + 3) + 4 + 5, ∅)
= 0+1+2+3+4+5

Here we delay the evaluation of f (z, x) in every step. This is the behavior for lazy-
evaluation. Otherwise, they will be evaluated in sequence of [1, 3, 6, 10, 15] in each call.
Generally, we can expand f oldl as:

f oldl(f, z, [x1 , x2 , ..., xn ]) = f (f (...(f (f (z, x1 ), x2 ), ..., xn ) (1.69)

Or express as infix:

f oldl(⊕, z, [x1 , x2 , ..., xn ]) = z ⊕ x1 ⊕ x2 ⊕ ... ⊕ xn (1.70)

f oldl is tail recursive. We can implement it with loops. We initialize the result as
z, then apply the binary operation on top of it with every element. It is typically called
Reduce in most imperative environment.
1: function Reduce(f, z, L)
2: while L 6= NIL do
3: z ← f (z, First(L) )
4: L ← Rest(L)
5: return z
Both f oldr and f oldl have their own suitable use cases. They are not always exchange-
able. For example, some container only allows to add element in one end (like stack). We
can define a function fromList to build such a container from a list (in Curried form):

fromList = f oldr(add, empty)

1.6. FOLD 47

Where empty is the empty container. The singly linked-list is such a container. It
performs well when add element to the head, but poorly when append to tail. f oldr is
a natural choice when duplicate a list while keep the order. But f oldl will generate a
reversed list. As a workaround, to implement the iterative reducing from right, we can
first reverse the list, then reduce it:
1: function Reduce-Right(f, z, L)
2: return Reduce(f, z, Reverse(L))
One may think f oldl should be the preferred one as it is optimized with tail call,
hence fits for both functional and imperative settings. It is also the online algorithm that
always holds the result so far. However, f oldr plays a critical role when handling infinite
list (modeled as stream) with lazy evaluation. For example, below program wraps every
natural number to a singleton list, and returns the first 10:

take(10, f oldr((x, xs) 7→ [x] : xs, ∅, [1, 2, ...])

⇒ [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]

It does not work with f oldl because the outer most evaluation never ends. We use a
unified notation f old when either left or right works. In this book, we also use f oldl and
f oldr to emphasis folding over the direction. Although this chapter is about list, the fold
concept is generic. It can be applied to other algebraic structures. We can fold a tree (2.6
in [99]), a queue, and many other things as long as they satisfy the following 2 criteria:

• The empty is defined (like the empty tree);

• We can decompose the recursive structure (like decompose tree into sub-trees and
key).

People abstract them further with concepts like foldable, monoid, and traversable.

Exercise 1.13
1. To define insertion-sort with f oldr, we designe the insert function as insert(x, L),
such that it can be expressed as sort = f oldr(insert, ∅). The type for f oldr is:

f oldr :: (A → B → B) → B → [A] → B

Where its first parameter f has the type A → B → B, the initial value z has
the type B. It folds on a list of A, and builds the result of B. How to define the
insertion-sort with f oldl? What is the type signature of f oldl?

1.6.3 example
As an example, let’s see how to implement the n-lights puzzle with f old and map. In the
brute-force solution, we create a list of pairs. Each pair (i, s) has a number i, and on/off
state s. Every round j, we scan the lights, toggle the i-th switch when the j divides the
i. We can define this process with f old:

f oldr (step, [(1, 0), (2, 0), ..., (n, 0)], [1, 2, ..., n])

As the initial state, all lights are off. We fold on the list of round numbers from 1
to n. Function step takes two parameters: the round number i, and the list of pairs. It
performs switching through map:

f oldr ((i, L) 7→ map(switch(i), L), [(1, 0), (2, 0), ..., (n, 0)], [1, 2, ..., n])
48 CHAPTER 1. LIST

The f oldr result is the pairs of final on/off state, we next extract the state from each
through map, and count the number with sum:

sum(map(snd, f oldr ((i, L) 7→

(1.71)
map(switch(i), L), [(1, 0), (2, 0), ..., (n, 0)], [1, 2, ..., n])))

concatenate

What if we apply f old on “+ +” (section 1.3.4) for a list of lists? It concatenates them to
a long list, just like sum to numbers.

+, ∅)
concat = f oldr (+ (1.72)

This is in Curried form. Its usage is as:

concat([[1], [2, 3, 4], [5, 6, 7, 8, 9]]) ⇒ [1, 2, 3, 4, 5, 6, 7, 8, 9]

Exercise 1.14

1. What’s the performance of concat?

2. Design a linear time concat algorithm
3. Define map in f oldr

1.7 Search and filter

Search and filter are generic concepts apply to a wide range of things. For list, it often
takes linear time to find the result, as we need traverse in most cases.

1.7.1 Exist
Given some a of type A, and a list of A, how to test if x is in the list? The idea is to
compare every element in the list with a, until either they are equal, or reach to the end:

• If the list is empty, then a does not exist;

• If the first element equals to a, then it exists;

• Otherwise, recursively test if a exists in the rest sub-list.

a∈∅ = F {alse
b = a : T rue (1.73)
a ∈ (b : bs) =
b 6= a : a ∈ bs

This algorithm is also called elem. It bounds to O(n) where n is the length. If the list
is ordered (ascending for example), one may want to improve the algorithm to logarithm
time with the idea of divide and conquer. However, list does not support random access,
we can’t apply binary search. See chapter 3 for details.
1.7. SEARCH AND FILTER 49

1.7.2 Look up
Let’s extend elem a bit. In the n-lights puzzle, we use a list of pairs [(k, v)]. Every pair
contains a key and a value. This kind of list is called ‘associate list’ (or assoc list). If
want to look up a given value in such list, we need extract some part (the value) for
comparison.

lookup(x, ∅) = {
N othing
v = x : Just (k, v) (1.74)
lookup(x, (k, v) : kvs) =
v 6= x : lookup(x, kvs)

Different from elem, we do not return true/false. Instead, we want to return the pair
of key-value when find. However, it is not guaranteed the value always exists. We use
an algebraic type called ‘Maybe’. A type of Maybe A has two different kinds of value. It
maybe some a in A of nothing. Denoted as Just a or N othing. This is the way to deal
with null reference issues(4.2.2 in [99]).

1.7.3 find and filter

We can make ‘look up’ more generic. Instead of only comparing if the element equals to
the given value, we can abstract to find the element that satisfies a specific predicate:

f ind(p, ∅) = N
{ othing
p(x) : Just x (1.75)
f ind(p, (x : xs)) =
otherwise : f ind(p, xs)

Although there can be multiple elements match, the f ind algorithm picks the first.
We can expand it to find all elements. It is often called f ilter as demonstrated in figure
1.4.

Input filter p Output

Figure 1.4: Input: [x1 , x2 , ..., xn ], Output: [x′1 , x′2 , ..., x′m ]. and ∀x′i ⇒ p(x′i ).

We can define it in ZF expression:

f ilter(p, X) = [xi |xi ∈ X, p(xi )] (1.76)

Different from f ind, when there is no element satisfies the predicate, f ilter returns
the empty list. It scans to examine every element one by one:

f ilter(p, ∅) = ∅
{
p(x) : x : f ilter(p, xs) (1.77)
f ilter(p, x : xs) =
otherwise : f ilter(p, xs)

This definition builds the result from right to left. For iterative implementation, if
build the result with append, it will degrade to O(n2 ).
1: function Filter(p, L)
2: L′ ← NIL
3: while L 6= NIL do
4: if p(First(L)) then
5: L′ ← Append(L′ , First(L)) ▷ Linear time
50 CHAPTER 1. LIST

6: L ← Rest(L)
The right way is to use cons instead, however, it builds the result in the reversed
order. We can further reverse it within linear time (see the exercise). The nature to build
result from right indicates that we can define filter in f oldr. We need define a function f
to test an element against the predicate, if OK, prepend to the result:
{
p(x) : x:A
f (x, A) = (1.78)
otherwise : A

We also need pass the predicate p to f . There are actually 3 parameters as f (p, x, A).
Filter is defined in f oldr with a Curried form of f :

f ilter(p) = f oldr((x, A) 7→ f (p, x, A), ∅) (1.79)

We can further simplify it (called η-conversion[73]) as:

f ilter(p) = f oldr(f (p), ∅) (1.80)

Filter is also a generic concept not only limit to list. We can apply a predicate on any
traversable structures to extract the result.

1.7.4 Match
Match is to find a pattern among some structure. Even if we limit to list and string, there
are still too many things to cover. We have dedicated chapters about string matching.
This section deals with the problem, that given a list A, and test if it exits in another list
B. There are two special cases: to test if A is prefix or suﬀix of B. The span algorithm
in (1.58) actually finds a prefix under a certain condition. We can do similar things: to
compare each element between A and B from left till meet any different one or reach the
end of either list. Define A ⊆ B if A is prefix of B:

∅⊆B = T rue
(a : as) ⊆ ∅ = F
{alse
(1.81)
a 6= b : F alse
(a : as) ⊆ (b : bs) =
a = b : as ⊆ bs

Prefix testing takes linear time as it scans the lists. However, we can not do suﬀix
testing in this way because it is hard to start from the aligned right ends, and scan
backwards for lists. This is different from array. Alternatively, we can reverse both lists
in linear time, hence change the problem to prefix testing:

A ⊇ B = reverse(A) ⊆ reverse(B) (1.82)

With ⊆ defined, we can test if a list is the sub-list of another one. We call it infix
testing. The idea is to scan the target list, and repeatedly applying the prefix testing:

inf ix?(a : as, ∅) = F

{alse
A⊆B: T rue (1.83)
inf ix?(A, B) =
otherwise : inf ix?(A, B ′ )

For the edge case that A is empty, we define empty is infix of any list. Because ∅ ⊆ B
is always true, it gives the right result. It also evaluates inf ix?(∅, ∅) correctly. Below is
the corresponding iterative implementation:
1.8. ZIP AND UNZIP 51

1: function Is-Infix(A, B)
2: if A = NIL then
3: return TRUE
4: n ← |A|
5: while B 6= NIL and n ≤ |B| do
6: if A ⊆ B then
7: return TRUE
8: B ← Rest(B)
9: return FALSE
Because prefix testing runs in linear time, and it is called in the loop of scan. This
algorithm is bound to O(nm), where m, n are the length of the two lists respectively. It
is an interesting problem to improve this ‘position by position’ scan algorithm to linear
time, even when we apply it to arrays. Chapter 13 introduces some smart methods,
like the Knuth-Morris-Pratt (KMP) algorithm and Boyer-Moore algorithm. Appendix C
introduces another method called suffix-tree.
In a symmetric way, we can enumerate all suffixes of B, and check if A is prefix of any
of them:
inf ix?(A, B) = ∃S ∈ suffixes(B), A ⊆ S (1.84)
This can be implemented with list comprehension as below example Haskell program:
isInfixOf a b = (not ◦ null) [ s | s ← tails(b), a ìsPrefixOf` s]

Where function isPrefixOf does the prefixing testing, tails generates all suﬀixes
of a given list. We left its implementation as an exercise.

Exercise 1.15
1. Implement the linear time existence testing algorithm.
2. Implement the iterative look up algorithm.
3. Implement the linear time filter algorithm through reverse.
4. Implement the iterative prefix testing algorithm.
5. Implement the algorithm to enumerate all suﬀixes of a list.

1.8 zip and unzip

The assoc list of paired values is often used as a light weighted dictionary for small set
of data. It is easier to build assoc list than tree or heap based dictionary, although the
look up performance of assoc list is linear instead of logarithm. In the ‘n-lights’ puzzle,
we build the assoc list as below:
map(i 7→ (i, 0), [1, 2, ..., n])
More often, we need ’zip’ two lists to one. We can define a zip function to do that:
zip(A, ∅) = ∅
zip(∅, B) = ∅ (1.85)
zip(a : as, b : bs) = (a, b) : zip(as, bs)
This algorithm works even the two lists have different length. The result length equals
to the shorter one. We can even use it to zip infinite lists (under lazy evaluation if both
are infinite), for example6 :
6 In Haskell: zip (repeat 0) [1..n]
52 CHAPTER 1. LIST

zip([0, 0, ...], [1, 2, ..., n])

For a list of words, we can index them with numbers as:

zip([1, 2, ...], [a, an, another, ...])

zip build the result from right. We can also define it with f oldr. It is bound to O(m)
time, where m is the length of the shorter list. When implement the iterative zip, the
performance will drop to quadratic if using append, unless with the reference to the tail
position.
1: function Zip(A, B)
2: C ← NIL
3: while A 6= NIL and B 6= NIL do
4: C ← Append(C, (First(A), First(B))) ▷ Linear time
5: A ← Rest(A)
6: B ← Rest(B)
7: return C
To avoid append, we can use ’cons’ then reverse the result. However, it can not deal
with two infinite lists. In imperative settings, we can also re-use A to store the result
(treat it as transform a list of elements to a list of pairs).
We can extend to zip multiple lists to one. Some programming libraries provide, zip,
zip3, zip4, ..., till zip7. Sometimes, we don’t want to build a list of pairs, but apply
a combinator function. For example, given a list of unit prices [1.00, 0.80, 10.05, ...] for
fruits: apple, orange, banana, ... When customer has a list of quantities, like [3, 1, 0, ...],
means this customer, buys 3 apples, 1 orange, 0 banana, ... Below program generates a
payment list:

pays(U, ∅) = ∅
pays(∅, Q) = ∅
pays(u : us, q : qs) = (u · q) : pays(us, qs)

It is same as the zip function except uses multiply but not ’cons’ to combine elements.
We can abstract the combinator as a function f , and pass it to zip to build a generic
algorithm:

zipW ith(f, A, ∅) = ∅
zipW ith(f, ∅, B) = ∅ (1.86)
zipW ith(f, a : as, b : bs) = f (a, b) : zipW ith(f, as, bs)

Here is an example that defines the inner-product (or dot-product)[98] through zipW ith:

A · B = sum(zipW ith(·, A, B)) (1.87)

unzip is the inverse operation of zip. It converts a list of pairs to two separated lists.
Below is its definition with f oldr in Curried form:

unzip = f oldr((a, b), (A, B) 7→ (a : A, b : B), (∅, ∅)) (1.88)

We fold from a pair of empty lists, break a, b from the pairs and prepend them to the
two intermediate lists respectively. We can also use f st and snd explicitly as:

(p, P ) 7→ (f st(p) : f st(P ), snd(p) : snd(P ))

For the fruits example, suppose the unit price is stored in a assoc list: U = [(apple, 1.00), (orange, 0.8
for lookup, for example lookup(melon, U ). The purchase quantity is a assoc list: Q =
1.8. ZIP AND UNZIP 53

[(apple, 3), (orange, 1), (banana, 0), ...]. How to calculate the total payment? The straight
forward way is to extract the unit price and the quantity lists, then compute their inner-
product:

pay = sum(zipW ith(·, snd(unzip(U )), snd(unzip(Q)))) (1.89)

As an example, let’s see how to use zipW ith to define infinite Fibonacci numbers with
lazy evaluation:

F = 0 : 1 : zipW ith(+, F, F ′ ) (1.90)

Where F is the infinite list of Fibonacci numbers, starts from 0 and 1. F ′ is the rest
Fibonacci numbers without the first one. From the third, every Fibonacci number is the
sum of numbers from F and F ′ at the same position. Below example program list the
first 15 Fibonacci numbers:
fib = 0 : 1 : zipWith (+) fib (tail fib)

take 15 fib
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377]

zip and unzip are generic. We can expand to zip two trees, where the nodes contain
paired elements from both. When traverse a collection of elements, we can also use the
generic zip and unzip to track the path, this is a method to mimic the ‘parent’ reference
in imperative implementation (last chapter of [10]).

Exercise 1.16
1. Design the iota (I) algorithm for below usages:
• iota(..., n) = [1, 2, 3, ..., n];
• iota(m, n) = [m, m + 1, m + 2, ..., n], where m ≤ n;
• iota(m, m + a, ..., n) = [m, m + a, m + 2a, ..., n];
• iota(m, m, ...) = repeat(m) = [m, m, m, ...];
• iota(m, ...) = [m, m + 1, m + 2, ...].
The last two cases are about infinite list. One possible implementation is through
streaming and lazy evaluation ([63] and [10]).
2. Implement the linear time imperative zip algorithm
3. Define zip with f oldr.
4. For the fruits example, suppose the quantity assoc list only contains the items with
none-zero quantity. i.e. instead of

Q = [(apple, 3), (banana, 0), (orange, 1), ...]

but
Q = [(apple, 3), (orange, 1), ...]

because customer does not buy banana. Design a program to calculate the total
payment.
5. Implement lastAt with zip.
54 Binary search tree

1.9 Further reading

List is the fundamental thing to build more complex data structures and algorithms par-
ticularly in functional settings. We introduced elementary algorithms to construct, access,
update, and transform list; how to search, filter data, and compute atop list. Although
most programming environments provide pre-defined tools and libraries to support list,
we should not simply treat them as black-boxes. Rabhi and Lapalme introduce many
functional algorithms about list in [72]. Haskell library provides detailed documenta-
tion about basic list algorithms. There are materials provide good examples of folding,
especially in [1]. It also introduces about the fold fusion law.

Exercise 1.17
1. Design algorithm to remove the duplicated elements in a list. For imperative
implementation, the elements should be removed in-place. The original element
order should be maintained. What is the complexity of this algorithm? How to
simplify it with additional data structure?
2. List can represent decimal non-negative integer. For example 1024 as list is 4 →
2 → 0 → 1. Generally, n = dm ...d2 d1 can be represented as d1 → d2 → ... → dm .
Given two numbers a, b in list form. Realize arithmetic operations such as add
and subtraction.
3. In imperative settings, a circular linked-list is corrupted, that some node points
back to previous one, as shown in figure 1.5. When traverse, it falls into infinite
loops. Design an algorithm to detect if a list is circular. On top of that, improve
it to find the node where loop starts (the node being pointed by two precedents).

Figure 1.5: A circular linked-list

Chapter 2

Binary Search Tree

2.1 Introduction
Array and list are typically considered the basic data structures. However, we’ll see they
are not necessarily easy to implement in chapter 12. Upon imperative settings, array is
the most elementary data structures. It is possible to implement linked-list using arrays
(Equation 3.4). While in functional settings, linked-list acts as the building blocks to
create array and other data structures.
We start from Binary Search Trees as the first data structure. Let us see an interesting
programming problem given by Bentley in Programming Pearls[2]. It is about to count
the number of words in text. Here is an example solution:
void wordcount(Input in) {
bst<string, int> map;
while string w = read(in) {
map[w] = if map[w] == null then 1 else map[w] + 1
}
for var (w, c) in map {
print(w, ":", c)
}
}

We can run it to count the words in a text file:

$ cat [Link] | wordcount > [Link]

The map is a binary search tree. Here we use the word as the key, and its occurrence
number as the value. This program runs fast, which reflects the power of binary search
tree. Before dive into it, let us first see the more generic tree, the binary tree. A binary
tree can be defined recursively. It is

• either empty;
• or contains 3 parts: the element, and two sub-trees called left and right children.

Figure 2.1 shows an example of binary tree.

A binary search tree is a special binary tree that its elements are comparable1 , and
satisfies the following constraints:

• For any node, all the keys in its left sub-tree are less than the key in this node;
1 It is abstract ordering, not limit to magnitude, but like precedence, subset of etc. the ‘less than’ (<)

is abstract in this chapter.

55
56 CHAPTER 2. BINARY SEARCH TREE

L R

(a) Binary tree structure

4 10

14 7 9 3

2 8 1

(b) A binary tree

Figure 2.1: Binary tree concept and an example.

• the key in this node is less than any key in its right sub-tree.

Figure 2.2 shows an example of binary search tree. Comparing with Figure 2.1, we
can see the differences in keys ordering. To highlight the elements in binary search tree
is comparable, we call it as key, and name the augmented satellite data as value.

2.2 Data Layout

Based on the recursive definition of binary search tree, we can design the data layout as
shown in figure 2.3. A node stores the key as a field, it can also store augmented data
(known as satellite data). The next two fields are pointers to the left and right sub-trees.
To make it easy for backtracking, it can also store a parent field pointed to its ancestor
node.
For illustration purpose, we’ll skip the augmented data. The appendix of this chapter
includes an example definition. In functional settings, it is seldom to use pointers for
backtracking. Typically, there is no such need, because the algorithm is usually top-down
recursive. Below is the example functional definition:
data Tree a = Empty
| Node (Tree a) a (Tree a)

2.3 Insertion
When insert a key k (or along with a value) to binary search tree T , we need ensure the
key ordering property is always hold:

• If the tree is empty, construct a leaf node with key = k;

2.3. INSERTION 57

3 8

1 7 16

2 10

9 14

Figure 2.2: A Binary Search Tree

key + satellite data

left
right
parent

key + satellite data key + satellite data

left left
right right
parent parent

... ... ... ...

Figure 2.3: Node layout with parent field.

58 CHAPTER 2. BINARY SEARCH TREE

• If k is less than the key of root, insert it to the left sub-tree;

• Otherwise, insert it in the right sub-tree.

There is an exceptional case that k is equal to the key of root. It means k already
exists in the tree. We can overwrite it, or append data, or do nothing. We’ll skip such case
handling. This algorithm is simple and straightforward. We can define it as a recursive
function:
insert(∅, k) = N ode(∅, k, ∅)
{
k < k′ : N ode(insert(Tl , k), k ′ , Tr ) (2.1)
insert(N ode(Tl , k, Tr ), k) =
otherwise : N ode(Tl , k ′ , insert(Tr , k))

For the none empty node, Tl denotes the left sub-tree, Tr denotes the right sub-tree,
and k ′ is the key. The function N ode(l, k, r) creates a node from two sub-trees and a key.
∅ means empty (also known as NIL. This symbol was invented by mathematician André
Weil for null set. It came from the Norwegian alphabet). Below is the corresponding
example program in Haskell for insertion.
insert Empty k = Node Empty k Empty
insert (Node l x r) k | k < x = Node (insert l k) x r
| otherwise = Node l x (insert r k)

This example program utilizes the pattern matching features. The appendix of this
chapter provides another example without using this feature. Insertion can also be im-
plemented without recursion. Here is a pure iterative algorithm:
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: parent ← NIL
5: while T 6= NIL do
6: parent ← T
7: if k < Key(T ) then
8: T ← Left(T )
9: else
10: T ← Right(T )
11: Parent(x) ← parent
12: if parent = NIL then ▷ tree T is empty
13: return x
14: else if k < Key(parent) then
15: Left(parent) ← x
16: else
17: Right(parent) ← x
18: return root

19: function Create-Leaf(k)

20: x ← Empty-Node
21: Key(x) ← k
22: Left(x) ← NIL
23: Right(x) ← NIL
24: Parent(x) ← NIL
25: return x
While a bit more complex than the functional one, the iterative implementation runs
faster, and it is capable to process very deep tree.
2.4. TRAVERSE 59

2.4 Traverse
Traverse is to visit every element one by one. There are 3 different ways to walk through
a binary tree: (1) pre-order tree walk, (2) in-order tree walk, (3) and post-order tree walk.
They are named to highlight the order of visiting key before/after sub-trees.

• pre-order: key - left - right;

• in-order: left - key - right;
• post-order: left - right - key.

Each ‘visit’ operation is recursive, for example in pre-order traverse, when visit the
left sub-tree, we recursively traverse it if it is not empty. For the tree shown in figure 2.2,
the corresponding visiting orders are as below:

• pre-order: 4, 3, 1, 2, 8, 7, 16, 10, 9, 14

• in-order: 1, 2, 3, 4, 7, 8, 9, 10, 14, 16
• post-order: 2, 1, 3, 7, 9, 14, 10, 16, 8, 4

It is not by accident that the in-order traverse lists the elements one by one increas-
ingly. The definition of the binary search tree ensures it is always true. We leave the
proof as an exercise. Specifically, the in-order traverse algorithm is defined as:

• If the tree is empty, stop and return;

• Otherwise, in-order traverse the left sub-tree; then visit the key; finally in-order
traverse the right sub-tree.

We can further define a generic map to apply any given function f to every element
in the tree along the in-order traverse. The result is a new tree mapped by f .

map(f, ∅) = ∅
(2.2)
map(f, N ode(Tl , k, Tr )) = N ode(map(f, Tl ), f (k), map(f, Tr ))

If we only need manipulate keys but not to transform the tree, we can implement this
algorithm imperatively.
1: function In-Order-Traverse(T, f )
2: if T 6= NIL then
3: In-Order-Traverse(Left(T, f ))
4: f (Key(T ))
5: In-Order-Traverse(Right(T, f ))
Leverage in-order traverse, we can change the map function to convert a binary search
tree to a sorted list. Instead building the tree in recursive case, we concatenate the result
to a list:
toList(∅) = [ ]
(2.3)
toList(N ode(Tl , k, Tr )) = toList(Tl ) +
+ [k] +
+ toList(Tr )

We can develop a method to sort a list of elements: first build a binary search tree
from the list, then turn it back to list through in-order traversing. This method is called
as ‘tree sort’. For a given list X = [x1 , x2 , x3 , ..., xn ].

sort(X) = toList(f romList(X)) (2.4)

60 CHAPTER 2. BINARY SEARCH TREE

And we can write it in point-free form[8].

sort = toList ◦ f romList

Where function f romList repeatedly inserts elements from a list to a tree. It can be
defined to recursively process the list.

f romList([ ]) = ∅
f romList(X) = insert(f romList(X ′ ), x1 )

When the list is empty, the result is an empty tree; otherwise, it inserts the first
element x1 to the tree, then recursively inserts the rests X ′ = [x2 , x3 , ..., xn ]. By using
list folding[7] (see appendix A.6), we can also define f romList as the following:

f romList(X) = foldl (insert, ∅, X) (2.5)

We can also rewrite it in Curried form[9] (also known as partial application) such as
to omit parameter X:

f romList = foldl insert ∅

Exercise 2.1
1. Given the in-order and pre-order traverse results, re-construct the tree, and output
the post-order traverse result. For example:
• Pre-order: 1, 2, 4, 3, 5, 6;
• In-order: 4, 2, 1, 5, 3, 6;
• Post-order: ?
2. Write a program to re-construct the binary tree from the pre-order and in-order
traverse lists.
3. For binary search tree, prove that the in-order traverse always visits elements in
increase order
4. Consider the performance of tree sort algorithm, what is its complexity for n
elements?

2.5 Query
Because the elements stored in binary search tree is well ordered and organized recursively.
It supports varies of search effectively. This is one of the reasons people name it as binary
search tree. There are mainly three types of querying: (1) look up a key; (2) find the
minimum or maximum element; (3) given any node, find its predecessor or successor.

2.5.1 Look up
Because binary search tree is recursive and all elements satisfy the ordering property, we
can look up a key k top-down from the root as the following:

• If the tree is empty, terminate. The key does not exist;

• Compare k with the key of root, if equal, we are done. The key is stored in the
root;
2.5. QUERY 61

• If k is less than the key of root, then recursively look up the left sub-tree;
• Otherwise, look up the right sub-tree.
We can define the recursive lookup function for this algorithm as below.
lookup(∅, x) = ∅


k = x : T
(2.6)
lookup(N ode(Tl , k, Tr ), x) = x<k: lookup(Tl , x)


otherwise : lookup(Tr , x)

This function returns the tree node being located or empty if not found. One may
instead return the value that bound to the key. However, in search implementation, we
need consider using M aybe type (also known as Optional<T>) to handle the not found
case, for example:
lookup Empty _ = Nothing
lookup t@(Node l k r) x | k == x = Just k
| x < k = lookup l x
| otherwise = lookup r x

If the binary search tree is well balanced, which means almost all branch nodes have
both none empty sub-trees except for leaves. This is not the formal definition of balance.
We’ll define it in chapter 4. For a balanced tree of n elements, the algorithm takes O(lg n)
time to look up a key. If the tree is poor balanced, the worst case is bound to O(n) time.
If denote the height of the tree as h, we can represent the performance of look up as O(h).
We can also implement looking up purely iterative without recursion:
1: function Search(T, x)
2: while T 6= NIL and Key(T ) 6= x do
3: if x < Key(T ) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: return T

2.5.2 Minimum and maximum

From the definition, we know that less keys are always on the left. To locate the minimum
element, we can keep traversing along the left sub-trees till reach to a node, where its left
sub-tree is empty. In a symmetric way, keep traversing along the right sub-trees gives the
maximum.
min(N ode(∅, k, Tr )) = k
(2.7)
min(N ode(Tl , k, Tr )) = min(Tl )

max(N ode(Tl , k, ∅)) = k

(2.8)
max(N ode(Tl , k, Tr )) = max(Tr )
Both functions are bound to O(h) time, where h is the height of the tree.

2.5.3 Successor and predecessor

When treat binary search tree as a generic container (a collection of elements), it is
common to traverse it with a bi-directional iterator. Start from the minimum element,
one can keep moving forward with the iterator towards the maximum, or go back and
forth. Below example program prints elements in sorted order.
62 CHAPTER 2. BINARY SEARCH TREE

void printTree (Node<T> t) {

for (var it = Iterator(t), [Link](); it = [Link]()) {
print([Link](), ", ");
}
}

Such use cases demand us to design algorithm to find the successor or predecessor of
any node. The successor of element x is defined as the smallest element y that satisfies
x < y. If the node of x has none empty right sub-tree, then minimum element of the right
sub-tree is the successor. As shown in figure 2.4, to find the successor of 8, we search the
minimum element in its right sub-tree, which is 9. If the right sub-tree of node x is empty,
we need back-track along the parent field till the closest ancestor whose left sub-tree is
also an ancestor of x. In figure 2.4, since node 2 does not have right sub-tree, we go up to
its parent of node 1. However, node 1 does not have left sub-tree, we need go up again,
hence reach to node 3. As the left sub-tree of node 3 is also an ancestor of node 2, node
3 is the successor of node 2.

3 8

1 7 16

2 10

9 14

Figure 2.4: The successor of 8, is the minimum one in its right sub-tree, 9; In order to
find the successor of 2, we go up to its parent 1, then 3.

If we finally reach to the root when back-track along the parent, but still can not find
an ancestor on the right, then the node does not have a successor. Below algorithm finds
the successor of a given node x:
1: function Succ(x)
2: if Right(x) 6= NIL then
3: return Min(Right(x))
4: else
5: p ← Parent(x)
6: while p 6= NIL and x = Right(p) do
7: x←p
8: p ← Parent(p)
9: return p
This algorithm returns NIL when x does not has successor. The predecessor finding
algorithm is symmetric:
1: function Pred(x)
2: if Left(x) 6= NIL then
2.6. DELETION 63

3: return Max(Left(x))
4: else
5: p ← Parent(x)
6: while p 6= NIL and x = Left(p) do
7: x←p
8: p ← Parent(p)
9: return p
It seems hard to find the purely functional solution, because there is no pointer like
field linking to the parent node2 . One solution is to left ‘breadcrumbs’ when we visit the
tree, and use these information to back-track or even re-construct the whole tree. Such
data structure, that contains both the tree and ‘breadcrumbs’ is called zipper[?].
Our original purpose to develop succ and pred functions is ‘to traverse all the elements’
as a generic container. However, in functional settings, we typically traverse the tree in-
order through map. We’ll meet similar situations in the rest of this book. A problems
valid in imperative settings may not be necessarily meaningful in functional settings. For
example, to delete an element from red-black tree[5].

Exercise 2.2
1. Use Pred and Succ to write an iterator to traverse the binary search tree as a
generic container. What’s the time complexity to traverse a tree of n elements?
2. One can traverse elements inside a range [a, b] for example:
for_each (m.lower_bound(12), m.upper_bound(26), f);
Write an equivalent functional program for binary search tree.

2.6 Deletion
We need special consideration when delete an element from the binary search tree. This
is because we must keep the ordering property, that for any node, all keys in left sub-tree
are less than the key of this node, and they are all less than any keys in right sub tree.
Blindly deleting a node may break this constraint.
To delete a node x from a binary search tree[6].

• If x has no sub-trees (a leaf) or only one sub-tree, splice x out;

• Otherwise (x has two sub-trees), use the minimum element y of its right sub-tree
to replace x, and splice the original y out.

The simplicity comes from the fact that, for the node to be deleted, if the right sub-
tree is not empty, then the minimum element is some node in it. It can’t have two none
empty children, and end up in the trivial case. Therefore, the node can be directly splice
out from the tree.
Figure 2.5, 2.6, and 2.7 illustrate different cases for deletion.
Based on this idea, we can define the delete algorithm as below:

delete(∅, x) = ∅


x < k : N ode(delete(Tl , x), k, Tr )
(2.9)
delete(N ode(Tl , k, Tr ), x) = x > k : N ode(Tl , k, delete(Tr , x))


x = k : del(Tl , Tr )
2 There is ref in ML and OCaml, we limit to the purely functional settings.
64 CHAPTER 2. BINARY SEARCH TREE

Tree

NIL NIL

Figure 2.5: x can be spliced out.

Tree
Tree

x
L

L NIL

(a) Before delete x. (b) After delete x, x is spliced out, and

replaced by its left child.

Tree
Tree

x
R

NIL R

(c) Before delete x. (d) After delete x, x is spliced out, and

replaced by its right sub-tree.

Figure 2.6: Delete a node with only one none empty sub-tree.
2.6. DELETION 65

Tree

min(R)

Tree

x
L delete(R, min(R))

L R

(a) Before delete x. (b) After delete x, x is replaced

by splicing the minimum element
from its right sub-tree.

Figure 2.7: Delete a node with two none empty sub-trees.

Function del performs slicing, and mutually call delete recursively to cut off the min-
imum from the right sub-tree.

del(∅, Tr ) = Tr
del(Tl , ∅) = Tl (2.10)
del(Tl , Tr ) = N ode(Tl , y, delete(Tr , y))

Where y = min(Tr ) is the minimum element in the right sub-tree. Here is the corre-
sponding example program:
delete Empty _ = Empty
delete (Node l k r) x | x < k = Node (delete l x) k r
| x > k = Node l k (delete r x)
| otherwise = del l r
where
del Empty r = r
del l Empty = l
del l r = let k' = min r in Node l k' (delete r k')

This algorithm firstly looks up the node to be deleted, then executes the deletion. It
takes O(h) time where h is the height of the tree.
The imperative deletion algorithm needs set the parent properly in addition. The
following one returns the root of the result tree.
1: function Delete(T, x)
2: r←T
3: x′ ← x ▷ save x
4: p ← Parent(x)
5: if Left(x) = NIL then
6: x ← Right(x)
7: else if Right(x) = NIL then
8: x ← Left(x)
9: else ▷ neither children is empty
10: y ← Min(Right(x))
66 CHAPTER 2. BINARY SEARCH TREE

11: Key(x) ← Key(y)

12: Copy other satellite data from y to x
13: if Parent(y) 6= x then ▷ y does not have left sub-tree
14: Left(Parent(y)) ← Right(y)
15: else ▷ y is the root of the right sub-tree
16: Right(x) ← Right(y)
17: if Right(y) 6= NIL then
18: Parent(Right(y)) ← Parent(y)
19: Remove y
20: return r
21: if x 6= NIL then
22: Parent(x) ← p
23: if p = NIL then ▷ remove the root
24: r←x
25: else
26: if Left(p) = x′ then
27: Left(p) ← x
28: else
29: Right(p) ← x
30: Remove x′
31: return r
We assume the node to be deleted is not empty. This algorithm first records the root,
creates copy reference to x, and its parent. If either sub-tree is empty, then we splice x
out. Otherwise, the node has two none empty sub-trees. We first located the minimum
node y in its right sub-tree, then replace the key of x with the one in y, copy the satellite
data, and finally, splice y out. We also need handle the special case, that y is the root of
the right sub-tree.
At last, we need reset the stored parent if x has only one none empty sub-tree. If the
parent pointer we copied is empty, it means that we are deleting the root. In this case,
we need return the new root. After the parent is set properly, we can safely remove x.
The deletion algorithm is bound to O(h) time, where h is the height of the tree.

Exercise 2.3
1. There is a symmetric deletion algorithm. When neither sub-tree is empty, we
can replace the key by splicing the maximum node off the left sub-tree. Write a
program to implement this solution.

2.7 Random build

All binary search tree algorithms we give so far are bound bound to O(h) time. The
height h of the tree impacts the performance. For a poor unbalanced tree, O(h) tends to
be O(n). It leads to the worst case. While for well balanced tree, O(h) close to O(lg n).
We can gain good performance.
We’ll see two well designed solutions to keep the tree balanced in chapter 4 and 5.
There exists a simple method, to build the binary search tree randomly[4]. It decreases
the possibility of giving a unbalanced binary tree. The idea is to randomly shuffle the
elements before building the tree.

Exercise 2.4
2.8. MAP 67

1. Write a randomly building algorithm for binary search tree.

2.8 Map
We can use binary search tree to realize the map data structure (also known as associative
data structure or dictionary). A finite map is a collection of key-value pairs. The keys
are unique, that every key is mapped to a value. For keys of type K, values of type V ,
the map is M ap K V or Map<K, V>. For none empty map, it contains n mappings of
k1 7→ v1 , k2 7→ v2 , ..., kn 7→ vn . When use the binary search tree to implement map, we
constrain K to be ordered set. Every node stores both key and value. We use the tree
insert/update operation to bind a key to a value. Given a key k, we use the tree lookup
to find the mapped value, or returns nothing when k does not exist. The red-black tree
and AVL tree introduced in later chapters can also be used to implement map.

2.9 Appendix: Example programs

Definition of binary search tree node with parent field.
data Node<T> {
T key
Node<T> left
Node<T> right
Node<T> parent

Node(T k) = Node(null, k, null)

Node(Node<T> l, T k, Node<T> r) {
left = l, key = k, right = r
if (left ̸= null) then [Link] = this
if (right ̸= null) then [Link] = this
}
}

Example program of recursive insertion. It does not use pattern matching.

Node<T> insert (Node<T> t, T x) {
if (t == null) {
return Node(null, x, null)
} else if ([Link] < x) {
return Node(insert([Link], x), [Link], [Link])
} else {
return Node([Link], [Link], insert([Link], x))
}
}

Example program to look up a key. Purely iterative without recursion.

Optional<Node<T>> lookup (Node<T> t, T x) {
while (t ̸= null and [Link] ̸= x) {
if (x < [Link]) {
t = [Link]
} else {
t = [Link]
}
}
return Optional(t);
}

Example iterative program to find the minimum of a tree.

68 Insertion sort

Optional<Node<T>> min (Node<T> t) {

while (t ̸= null and [Link] ̸= null) {
t = [Link]
}
return Optional(t);
}

Example program to find the successor of a node.

Optional<Node<T>> succ (Node<T> x) {
if (x == null) {
return [Link]
} else if ([Link] ̸= null) {
return min([Link])
} else {
p = [Link]
while (p ̸= null and x == [Link]) {
x = p
p = [Link]
}
return Optional(p);
}
}
Chapter 3

Insertion sort

3.1 Introduction
Insertion sort is a straightforward sort algorithm1 . We give its preliminary definition
for list in chapter 1. For a collection of comparable elements, we repeatedly pick one,
insert them to a list and maintain the ordering. As every insertion takes linear time, its
performance is bound to O(n2 ) where n is the number of elements. This performance is
not as good as the divide and conqueror sort algorithms, like quick sort and merge sort.
However, we can still find its application today. For example, a well tuned quick sort
implementation falls back to insertion sort for small data set. The idea of insertion sort is
similar to sort a deck of a poker cards([4] pp.15). The cards are shuffled. A player takes
card one by one. At any time, all cards on hand are sorted. When draws a new card, the
player inserts it in proper position according to the order of points as shown in figure 3.1.

Figure 3.1: Insert card 8 to a deck.

Based on this idea, we can implement insertion sort as below:

1: function Sort(A)
2: S ← NIL
3: for each a ∈ A do
4: Insert(a, S)
5: return S
We store the sorted result in a new array, alternatively, we can change it to in-place:
1: function Sort(A)
1 We skip the ‘Bubble sort’ method

69
70 CHAPTER 3. INSERTION SORT

2: for i ← 2 to |A| do
3: ordered insert A[i] to A[1...(i − 1)]
Where the index i ranges from 1 to n = |A|. We start from 2, because the singleton
sub-array [A[1]] is ordered. When process the i-th element, all elements before i are
sorted. We continuously insert elements till consuming all the unsorted ones, as shown in
figure 3.2.

insert

... sorted elements ... x ... unsorted elements ...

Figure 3.2: Continuously insert elements to the sorted part.

3.2 Insertion
In chapter 1, we give the ordered insertion algorithm for list. For array, we also scan it
to locate the insert position either from left or right. Below algorithm is from right:
1: function Sort(A)
2: for i ← 2 to |A| do ▷ Insert A[i] to A[1...(i − 1)]
3: x ← A[i] ▷ Save A[i] to x
4: j ←i−1
5: while j > 0 and x < A[j] do
6: A[j + 1] ← A[j]
7: j ←j−1
8: A[j + 1] ← x
It’s expensive to insert at arbitrary position, as array stores elements continuously.
When insert x at position i, we need shift all elements after i (i.e. A[i + 1], A[i + 2], ...)
one cell to right. After free up the cell at i, we put x in, as shown in figure 3.3.

insert

A[1] A[2] ... A[i-1] A[i] A[i+1] A[i+2] ... A[n-1] A[n] empty

Figure 3.3: Insert x to A at i.

For the array of length n, suppose after comparing x to the first i elements, we located
the position to insert. Then we shift the rest n − i + 1 elements, and put x in the i-th
cell. Overall, we need traverse the whole array if scan from left. On the other hand, if
scan from right to left, we examine n − i + 1 elements, and perform the same amount of
shifts. We can also define a separated Insert() function, and call it inside the loop. The
insertion takes linear time no matter scans from left or right, hence the sort algorithm is
bound to O(n2 ), where n is the number of elements.
3.3. BINARY SEARCH 71

Exercise 3.1
1. Implement the insert to scan from left to right.
2. Define the insert function, and call it from the sort algorithm.

3.3 Binary search

When insert a poker card, human does not scan, but takes a quick glance at the deck to
locate the position. We can do this because the deck is sorted. Binary search is such a
method that applies to ordered sequence.
1: function Sort(A)
2: for i ← 2 to |A| do
3: x ← A[i]
4: p ← Binary-Search(x, A[1...(i − 1)])
5: for j ← i down to p do
6: A[j] ← A[j − 1]
7: A[p] ← x
Binary search utilize the fact that the slice A[1...(i − 1)] is ordered. Suppose it is
ascending without loss of generality (as we can define ≤ abstract). To find the position
j that satisfies A[j − 1] ≤ x ≤ A[j], we compare x to the middle element A[m], where
i
m = b c. If x < A[m], we then recursively apply binary search to the first half; otherwise,
2
we search the second half. As every time, we halve the elements, binary search takes
O(lg i) time to locate the insert position.
1: function Binary-Search(x, A)
2: l ← 1, u ← 1 + |A|
3: while l < u do
l+u
4: m←b c
2
5: if A[m] = x then
6: return m ▷ Duplicated element
7: else if A[m] < x then
8: l ←m+1
9: else
10: u←m
11: return l
The improved sort algorithm is still bound to O(n2 ). The one with scan takes O(n2 )
comparisons and O(n2 ) shifts; with binary search, it overall takes O(n lg n) comparisons
and O(n2 ) shifts.

Exercise 3.2
1. Implement the recursive binary search.

3.4 List
With binary search, the search time improved to O(n lg n). However, as we need shift
array cells when insert, the overall time is still bound to O(n2 ). On the other hand, when
use list, the insert operation is constant time at a given node reference. In chapter 1, we
72 CHAPTER 3. INSERTION SORT

define the insertion sort algorithm for list as below:

sort(∅) = ∅
(3.1)
sort(x : xs) = insert(x, sort(xs))
Or with f oldl in Curried form:
sort = f oldl (insert, ∅) (3.2)
However, the list insert algorithm still takes linear time, because we need scan to
locate the insert position:
insert(x, ∅) = [x]
{
x≤y: x : y : ys (3.3)
insert(x, y : ys) =
otherwise : y : insert(x, ys)

Instead of using node reference, we can also realize list through an additional index
array. For every element A[i], N ext[i] stores the index to the next element follows A[i],
i.e. A[N ext[i]] is the next element of A[i]. There are two special indexes: for the tail node
A[m], we define N ext[m] = −1, indicating it points to NIL; we also define N ext[0] to
index the head element. With the index array, we can implement the insertion algorithm
as below:
1: function Insert(A, N ext, i)
2: j←0 ▷ N ext[0] for head
3: while N ext[j] 6= −1 and A[N ext[j]] < A[i] do
4: j ← N ext[j]
5: N ext[i] ← N ext[j]
6: N ext[j] ← i

7: function Sort(A)
8: n ← |A|
9: N ext = [1, 2, ..., n, −1] ▷ n + 1 indexes
10: for i ← 1 to n do
11: Insert(A, N ext, i)
12: return N ext
With list, although the insert operation changes to constant time, we need traverse
the list to locate the position. It is still bound to O(n2 ) times comparison. Unlike array,
list does not support random access, hence we can not use binary search to speed up.

Exercise 3.3
1. For the index array based list, we return the re-arranged index as result. Design
an algorithm to re-order the original array A from the index N ext.

3.5 Binary search tree

We drive into a corner. We want to improve both comparison and insertion at the same
time, or will end up with O(n2 ) performance. For comparison, we need binary search
to achieve O(lg n) time; on the other hand, we need change the data structure, because
array can not support constant time insertion at a position. We introduce a powerful
data structure in chapter 2, the binary search tree. It supports binary search from its
definition by nature. At the same time, we can insert a new node in binary search tree
fast at the given location.
3.6. SUMMARY 73

1: function Sort(A)
2: T ←∅
3: for each x ∈ A do
4: T ← Insert-Tree(T, x)
5: return To-List(T )
Where Insert-Tree() and To-List() are defined in chapter 2. In average case, the
performance of tree sort is bound to O(n lg n), where n is the number of elements. This
is the lower limit of comparison based sort([?] pp.180-193). However, in the worst case,
if the tree is poor balanced the performance drops to O(n2 ).

3.6 Summary
Insertion sort is often used as the first example of sorting. It is straightforward and easy to
implement. However its performance is quadratic. Insertion sort does not only appear in
textbooks, it has practical use case in the quick sort implementation. It is an engineering
practice to fallback to insertion sort when the number of elements is small.
74 Red-black tree
Chapter 4

Red-black tree

4.1 Introduction
As the example in chapter 2, we use the binary search tree as a dictionary to count the
word occurrence in text. One may want to feed a address book to a binary search tree,
and use it to look up the contact as below example program:

void addrBook(Input in) {

bst<string, string> dict
while (string name, string addr) = read(in) {
dict[name] = addr
}
loop {
string name = read(console)
var addr = dict[name]
if (addr == null) {
print("not found")
} else {
print("address: ", addr)
}
}
}

Unlike the word counter program, this one performs poorly, especially when search
names like Zara, Zed, Zulu, etc. This is because the address entries are typically listed in
lexicographic order, i.e. the names are input in ascending order. If insert numbers 1, 2,
3, ..., n to a binary search tree, it ends up like in figure 4.1. It is an extremely unbalanced
binary search tree. The lookup() is bound to O(h) time for a tree with height h. When
the tree is well balanced, the performance is O(lg n), where n is the number of elements in
the tree. But in this extreme case, the performance downgrades to O(n). It is equivalent
to list scan.

Exercise 4.1

1. For a big address entry list in lexicographic order, one may want to speed up
building the address book with two concurrent tasks: one reads from the head;
while the other reads from the tail, till they meet at some middle point. What
does the binary search tree look like? What if split the list into multiple sections
to scale the concurrency?
2. Find more cases to exploit a binary search tree, for example in figure 4.2.

75
76 CHAPTER 4. RED-BLACK TREE

...

Figure 4.1: unbalanced tree

4.1.1 Balance
To avoid extremely unbalanced case, we can shuffle the input(12.4 in [4]), however, when
the input is entered by user interactively, we can not randomize the sequence. People de-
veloped solutions to make the tree balanced. They mostly rely on the rotation operation.
Rotation changes the tree structure while maintain the elements ordering. This chapter
introduces the red-black tree, the widely used self-adjusting balanced binary search tree.
Next chapter is about AVL tree, another self-balanced tree. Chapter 8 introduce the
splay tree. It adjusts the tree in steps to make it balanced.

4.1.2 Tree rotation

Tree rotation transforms the tree structure while keeping the in-order traverse result
unchanged. There are multiple binary search trees generate the same ordered element
sequence. Figure 4.3 shows the tree rotation.
Tree rotation can be defined with pattern matching:

rotatel ((a, x, b), y, c) = (a, x, (b, y, c))

(4.1)
rotatel T = T

and

rotater (a, x, (b, y, c)) = ((a, x, b), y, c))

(4.2)
rotater T = T

The second row in each equation keeps the tree unchanged if the pattern does not
match (for example, both sub-trees are empty). We can also implement tree rotation
imperatively. We need re-assign sub-trees and parent node reference. When rotate, we
pass both the root T , and the node x as parameters:
1: function Left-Rotate(T, x)
2: p ← Parent(x)
3: y ← Right(x) ▷ assume y 6= NIL
4: a ← Left(x)
5: b ← Left(y)
6: c ← Right(y)
7: Replace(x, y) ▷ replace node x with y
4.1. INTRODUCTION 77

n
n

n-1 3

n-2 n-1

... 4

1 ...

(a) (b)
m

m-1 m+1

m-2 m+2

... ...

1 n

(c)

Figure 4.2: Unbalanced trees

Figure 4.3: ‘left rotate’ and ‘right rotate’.

78 CHAPTER 4. RED-BLACK TREE

8: Set-Subtrees(x, a, b) ▷ Set a, b as the sub-trees of x

9: Set-Subtrees(y, x, c) ▷ Set x, c as the sub-trees of y
10: if p = NIL then ▷ x was the root
11: T ←y
12: return T
The Right-Rotate is symmetric, we leave it as exercise. The Replace(x, y) uses
node y to replace x:
1: function Replace(x, y)
2: p ← Parent(x)
3: if p = NIL then ▷ x is the root
4: if y 6= NIL then Parent(y) ← NIL
5: else if Left(p) = x then
6: Set-Left(p, y)
7: else
8: Set-Right(p, y)
9: Parent(x) ← NIL
Procedure Set-Subtrees(x, L, R) assigns L as the left, and R as the right sub-trees
of x:
1: function Set-Subtrees(x, L, R)
2: Set-Left(x, L)
3: Set-Right(x, R)
It further calls Set-Left and Set-Right to set the two sub-trees:
1: function Set-Left(x, y)
2: Left(x) ← y
3: if y 6= NIL then Parent(y) ← x

4: function Set-Right(x, y)
5: Right(x) ← y
6: if y 6= NIL then Parent(y) ← x
We can see how pattern matching simplifies the tree rotation. Based on this idea,
Okasaki developed the purely functional algorithm for red-black tree in 1995[13].

Exercise 4.2
1. Implement the Right-Rotate.

4.2 Definition
A red-black tree is a self-balancing binary search tree[14]. It is essentially equivalent to
2-3-4 tree1 . By coloring the node red or black, and performing rotation, red-black tree
provides an eﬀicient way to keep the tree balanced. On top of the binary search tree
definition, we label the node with a color. We say it is a red-black tree if the coloring
satisfies the following 5 rules([4] pp273):

1. Every node is either red or black.

2. The root is black.
3. Every leaf (NIL) is black.
1 Chapter 7, B-tree. For any 2-3-4 tree, there is at least one red-black tree has the same ordered data.
4.2. DEFINITION 79

4. If a node is red, then both sub-trees are black.

5. For every node, all paths from it to descendant leaves contain the same number of
black nodes.

Why do they keep the red-black tree balanced? The key point is that, the longest
path from the root to leaf can not be as 2 times longer than the shortest path. Consider
rule 4, there can not be any two adjacent red nodes. Therefore, the shortest path only
contains black nodes. Any longer path must have red ones. In addition, rule 5 ensures all
paths have the same number of black nodes. So as to the root. It eventually ensures any
path is not 2 times longer than others[14]. Figure 4.4 gives an example of red-black tree.

8 17

1 11 15 25

NIL 6 NIL NIL NIL NIL 22 27

NIL NIL NIL NIL NIL NIL

Figure 4.4: A red-black tree

As all NIL nodes are black, we can hide them as shown in figure 4.5. All operations
including lookup, min/max, are same as the binary search tree. However, the insert and
delete are special, as we need maintain the coloring rules.

8 17

1 11 15 25

6 22 27

Figure 4.5: Hide the NIL nodes

Below example program adds the color field atop binary search tree definition:
data Color = R | B
data RBTree a = Empty
80 CHAPTER 4. RED-BLACK TREE

| Node Color (RBTree a) a (RBTree a)

Exercise 4.3
1. Prove the height h of a red-black tree of n nodes is at most 2 lg(n + 1)

4.3 Insert
The insert algorithm for red-black tree has two steps. The first step is as same as the
binary search tree. The tree may become unbalanced after that, we need fix it to resume
the red-black tree coloring in the second step. When insert a new element, we always
make it red. Unless the new node is the root, we won’t break any coloring rules except for
the 4-th. Because it may bring two adjacent red nodes. Okasaki finds there are 4 cases
violate rule 4. All have two adjacent red nodes. They share a uniformed structure after
fixing[13] as shown in figure 4.6.

Figure 4.6: Fix 4 cases to the same structure.

All 4 transformations move the redness one level up. When perform bottom-up re-
cursive fixing, it may color the root red. While rule 2 requires the root always be black.
We need revert the root back to black finally. With pattern matching, we can define a
balance function to fix the tree. Denote the color as C with values black B, and red R.
A none empty node is in the form of T = (C, l, k, r), where l, r are the left and right
sub-trees, k is the key.

balance B (R, (R, a, x, b), y, c) z d = (R, (B, a, x, b), y, (B, c, z, d))

balance B, (R, a, x, (R, b, y, c)) z d = (R, (B, a, x, b), y, (B, c, z, d))
balance B a x (R, b, y, (R, c, z, d)) = (R, (B, a, x, b), y, (B, c, z, d)) (4.3)
balance B a x (R, (R, b, y, c), z, d) = (R, (B, a, x, b), y, (B, c, z, d))
balance T = T

The last row says if the tree is not in any 4 patterns, then we leave it unchanged. We
define the insert algorithm for red-black tree as below:

insert T k = makeBlack (ins T k) (4.4)

4.3. INSERT 81

where
ins ∅ k (R, ∅, k, ∅)
= {
k < k ′ : balance C (ins l k) k ′ r (4.5)
ins (C, l, k ′ , r) k =
k > k ′ : balance C l k ′ (ins r k)

If the tree is empty, we create a red leaf of k; otherwise, let the sub-trees and the key
be l, r, k ′ , we compare k and k ′ , then recursively insert k to a sub-tree. After that, we
call balance to fix the coloring, then force the root to be black finally.

makeBlack (C, l, k, r) = (B, l, k, r) (4.6)

Below is the corresponding example program:

insert t x = makeBlack $ ins t where
ins Empty = Node R Empty x Empty
ins (Node color l k r)
| x < k = balance color (ins l) k r
| otherwise = balance color l k (ins r)
makeBlack(Node _ l k r) = Node B l k r

balance B (Node R (Node R a x b) y c) z d =

Node R (Node B a x b) y (Node B c z d)
balance B (Node R a x (Node R b y c)) z d =
Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R b y (Node R c z d)) =
Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R (Node R b y c) z d) =
Node R (Node B a x b) y (Node B c z d)
balance color l k r = Node color l k r

We skip to handle the duplicated keys. If the key already exists, we can overwrite,
drop, or store the values in a list ([4], pp269). Figure 4.7 shows two red-black trees built
from sequence 11, 2, 14, 1, 7, 15, 5, 8, 4 and 1, 2, ..., 8. The second example demonstrates
the tree is well balanced even for ordered input.

7 4

2 14 2 6

1 5 11 15 1 3 5 7

4 8 8

Figure 4.7: Red-black tree examples

The algorithm performs top-down recursive insertion and fixing. It is bound to O(h)
time, where h is the height of the tree. As the red-black tree coloring rules are maintained,
h is the logarithm to the number of nodes n. The overall performance is O(lg n).

Exercise 4.4
1. Implement the insert algorithm without using pattern matching, but test the 4
cases separately.
82 CHAPTER 4. RED-BLACK TREE

4.4 Delete
Delete is more complex than insert. We can also use pattern matching and recursion to
simplify the delete algorithm for red-black tree2 . There are alternatives to mimic delete.
Sometimes, we build the read-only tree, then use it for frequently looking up[5]. When
delete, we mark the deleted node with a flag, and later rebuild the tree if such nodes
exceeds 50%. Delete may also violate the red-black tree coloring rules. We use the same
idea to apply fixing after delete. The coloring violation only happens when delete a black
node according to rule 5. The black nodes along the path decreases by one, hence not all
paths contain the same number of black nodes.
To resume the blackness, we introduce a special ‘doubly-black’ node([4], pp290). One
such node is counted as 2 black nodes. When delete a black node x, we can move the
blackness either up to its parent or down to one sub-tree. Let this node be y that accepts
the blackness. If y was red, we turn it black; if y was already black, we make it ‘doubly-
black’, denoted as B 2 . Below example program adds the ‘doubly-black’ support:
data Color = R | B | BB
data RBTree a = Empty | BBEmpty
| Node Color (RBTree a) a (RBTree a)

Because all empty leaves are black, when push the blackness down to a leaf, it becomes
‘doubly-black’ empty (BBEmpty, or bold ∅ ). The first step is to perform the normal
binary search tree delete; then if the cut off node is black, we shift the blackness, and fix
the tree coloring.

delete = makeBlack ◦ del (4.7)

This definition is in Curried form. When delete the only element, the tree becomes
empty. To cover this case, we modify makeBlack as below:

makeBlack ∅ = ∅
(4.8)
makeBlack (C, l, k, r) = (B, l, k, r)

Where del accepts the tree and k to be deleted:

del ∅ k = ∅

′ ′
 2
k < k : f ixB (C, (del l k), k , r)



k > k ′ : f ixB 2 (C, l, k ′ , (del r k))

 
 l = ∅ : (C = B 7→ shif tB r, r)

 (4.9)
del (C, l, k ′ , r) k = 
r = ∅ : (C = B 7→ shif tB l, l)


k = k ′ :


  f ixB 2 (C, l, k ′′ , (del r k ′′ ))

 
 else :
 
where k ′′ = min(r)

When the tree is empty, the result is ∅; otherwise, we compare the key k ′ in the tree
with k. If k < k ′ , we recursively delete k from the left sub-tree; if k > k ′ then delete from
the right. Because the recursive result may contain doubly-black node, we need apply
f ixB 2 to fix it. When k = k ′ , we need splice it out. If either sub-tree is empty, we replace
it with the other, then shift the blackness if the spliced node is black. This is represented
with McCarthy form (p 7→ a, b), which is equivalent to ‘(if p then a else b)’. If neither
sub-tree is empty, we cut the minimum element k ′′ = min(r), and use k ′′ to replace k.
2 Actually, the tree is rebuilt in purely functional setting, although the common part is reused. This

feature is called ‘persist’

4.4. DELETE 83

To reserve the blackness, shif tB makes a black node doubly-black, and forces it black
for other cases. It flips doubly-black to normal black when applied twice.

shif tB (B, l, k, r) = (B 2 , l, k, r)
shif tB (C, l, k, r) = (B, l, k, r)
(4.10)
shif tB ∅ = ∅
shif tB ∅ = ∅

Below is the example program (except the doubly-black fixing part).

delete :: (Ord a) ⇒ RBTree a → a → RBTree a

delete t k = makeBlack $ del t k where
del Empty _ = Empty
del (Node color l k' r) k
| k < k' = fixDB color (del l k) k' r
| k > k' = fixDB color l k' (del r k)
| isEmpty l = if color == B then shiftBlack r else r
| isEmpty r = if color == B then shiftBlack l else l
| otherwise = fixDB color l k'' (del r k'') where k''= min r
makeBlack (Node _ l k r) = Node B l k r
makeBlack _ = Empty

shiftBlack (Node B l k r) = Node BB l k r

shiftBlack (Node _ l k r) = Node B l k r
shiftBlack Empty = BBEmpty
shiftBlack BBEmpty = Empty

The f ixB 2 function eliminates the doubly-black node by rotation and re-coloring.
The doubly-black node can be branch node or empty ∅ . There are three cases:
Case 1. The sibling of the doubly-black node is black, and it has a red sub-tree. We
can fix this case with a rotation. There are 4 sub-cases, all can be transformed to a
uniformed pattern, as shown in figure A.1.

Figure 4.8: 4 sub-cases share the uniformed fixing pattern

84 CHAPTER 4. RED-BLACK TREE

The fixing for these 4 sub-cases can be realized with pattern matching.

f ixB 2 C aB2 x (B, (R, b, y, c), z, d) = (C, (B, shif tB(a), x, b), y, (B, c, z, d))
f ixB 2 C aB2 x (B, b, y, (R, c, z, d)) = (C, (B, shif tB(a), x, b), y, (B, c, z, d))
f ixB 2 C (B, a, x, (R, b, y, c)) z dB2 = (C, (B, a, x, b), y, (B, c, z, shif tB(d)))
f ixB 2 C (B, (R, a, x, b), y, c) z dB2 = (C, (B, a, x, b), y, (B, c, z, shif tB(d)))
(4.11)
Where a means node a is doubly-black, it can be branch or .
B2 ∅
Case 2. The sibling of the doubly-black is red. We can rotate the tree to turn it into
case 1 or 3, as shown in figure A.2.

Figure 4.9: The sibling of the doubly-black is red.

We add this fixing as additional 2 rows in equation (4.11):

...
f ixB 2 B aB2 x (R, b, y, c) = f ixB 2 B (f ixB 2 R a x b) y c (4.12)
f ixB 2 B (R, a, x, b) y cB2 = f ixB 2 B a x (f ixB 2 R b y c)

Case 3. The sibling of the doubly-black node, and its two sub-trees are all black. In
this case, we change the sibling to red, flip the doubly-black node to black, and propagate
the doubly-blackness a level up to parent as shown in figure A.3.
There are two symmetric sub-cases. For the upper case, x was either red or black. x
changes to black if it was red, otherwise changes to doubly-black; Same coloring changes
to y in the lower case. We add this fixing to equation (4.12):

...
f ixB 2 C aB2 x (B, b, y, c) = shif tB (C, (shif tB a), x, (R, b, y, c))
(4.13)
f ixB 2 C (B, a, x, b) y cB2 = f ixB 2 B a x (f ixB 2 R b y c)
f ixB 2 C l k r = (C, l, k, r)

If none of the patterns match, the last row keeps the node unchanged. The doubly-
black fixing is recursive. It terminates in two ways: One is Case 1, the doubly-black
node is eliminated. Otherwise the blackness may move up till the root. Finally the we
force the root be black. Below example program puts all three cases together:
−− the sibling is black, and has a red sub-tree
fixDB color a@(Node BB _ _ _) x (Node B (Node R b y c) z d)
= Node color (Node B (shiftBlack a) x b) y (Node B c z d)
4.4. DELETE 85

Figure 4.10: move the blackness up.

fixDB color BBEmpty x (Node B (Node R b y c) z d)

= Node color (Node B Empty x b) y (Node B c z d)
fixDB color a@(Node BB _ _ _) x (Node B b y (Node R c z d))
= Node color (Node B (shiftBlack a) x b) y (Node B c z d)
fixDB color BBEmpty x (Node B b y (Node R c z d))
= Node color (Node B Empty x b) y (Node B c z d)
fixDB color (Node B a x (Node R b y c)) z d@(Node BB _ _ _)
= Node color (Node B a x b) y (Node B c z (shiftBlack d))
fixDB color (Node B a x (Node R b y c)) z BBEmpty
= Node color (Node B a x b) y (Node B c z Empty)
fixDB color (Node B (Node R a x b) y c) z d@(Node BB _ _ _)
= Node color (Node B a x b) y (Node B c z (shiftBlack d))
fixDB color (Node B (Node R a x b) y c) z BBEmpty
= Node color (Node B a x b) y (Node B c z Empty)
−− the sibling is red
fixDB B a@(Node BB _ _ _) x (Node R b y c)
= fixDB B (fixDB R a x b) y c
fixDB B a@BBEmpty x (Node R b y c)
= fixDB B (fixDB R a x b) y c
fixDB B (Node R a x b) y c@(Node BB _ _ _)
= fixDB B a x (fixDB R b y c)
fixDB B (Node R a x b) y c@BBEmpty
= fixDB B a x (fixDB R b y c)
−− the sibling and its 2 children are all black, move the blackness up
fixDB color a@(Node BB _ _ _) x (Node B b y c)
= shiftBlack (Node color (shiftBlack a) x (Node R b y c))
fixDB color BBEmpty x (Node B b y c)
= shiftBlack (Node color Empty x (Node R b y c))
fixDB color (Node B a x b) y c@(Node BB _ _ _)
= shiftBlack (Node color (Node R a x b) y (shiftBlack c))
fixDB color (Node B a x b) y BBEmpty
= shiftBlack (Node color (Node R a x b) y Empty)
−− otherwise
fixDB color l k r = Node color l k r

The delete algorithm is bound to O(h) time, where h is the height of the tree. As
red-black tree maintains the balance, h = O(lg n) for n nodes.

Exercise 4.5
1. Implement the alternative delete algorithm: mark the node as deleted without
actually removing it. When the marked nodes exceed 50%, re-build the tree.
86 CHAPTER 4. RED-BLACK TREE

4.5 Imperative red-black tree algorithm ⋆

We simplify the red-black tree implementation with pattern matching. In this section, we
give the imperative algorithm for completeness. When insert, the first step is as same as
the binary search tree, then as the second step, we fix the balance through tree rotations.
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: Color(x) ← RED
5: p ← NIL
6: while T 6= NIL do
7: p←T
8: if k < Key(T ) then
9: T ← Left(T )
10: else
11: T ← Right(T )
12: Parent(x) ← p
13: if p = NIL then ▷ tree T is empty
14: return x
15: else if k < Key(p) then
16: Left(p) ← x
17: else
18: Right(p) ← x
19: return Insert-Fix(root, x)
We make the new node red, and then perform fixing before return. There are 3 basic
cases, each one has a symmetric case, hence there are total 6 cases. Among them, we can
merge two cases, because both have a red ‘uncle’ node. We change the parent and uncle
to black, and set grand parent to red:
1: function Insert-Fix(T, x)
2: while Parent(x) 6= NIL and Color(Parent(x)) = RED do
3: if Color(Uncle(x)) = RED then ▷ Case 1, x’s uncle is red
4: Color(Parent(x)) ← BLACK
5: Color(Grand-Parent(x)) ← RED
6: Color(Uncle(x)) ← BLACK
7: x ← Grand-Parent(x)
8: else ▷ x’s uncle is black
9: if Parent(x) = Left(Grand-Parent(x)) then
10: if x = Right(Parent(x)) then ▷ Case 2, x is on the right
11: x ← Parent(x)
12: T ← Left-Rotate(T, x)
▷ Case 3, x is on the left
13: Color(Parent(x)) ← BLACK
14: Color(Grand-Parent(x)) ← RED
15: T ← Right-Rotate(T , Grand-Parent(x))
16: else
17: if x = Left(Parent(x)) then ▷ Case 2, Symmetric
18: x ← Parent(x)
19: T ← Right-Rotate(T, x)
▷ Case 3, Symmetric
20: Color(Parent(x)) ← BLACK
21: Color(Grand-Parent(x)) ← RED
4.6. SUMMARY 87

22: T ← Left-Rotate(T , Grand-Parent(x))

23: Color(T ) ← BLACK
24: return T
This algorithm takes O(lg n) time to insert a key, where n is the number of nodes.
Compare to the balance function defined previously, they have different logic. Even input
the same sequence of keys, they build different red-black trees. Figure 4.11 shows the
result when input the same sequence of keys to the imperative algorithm. We can see the
difference from figure 4.7. There is a bit performance overhead in the pattern matching
algorithm. Okasaki discussed the difference in detail in [13].

2 14

1 7 15

5 8

2 7

1 4 6 9

3 8

Figure 4.11: Red-black trees created by imperative algorithm.

We provide the imperative delete algorithm in Appendix A of this book.

4.6 Summary

Red-black tree is a popular implementation of balanced binary search tree. We introduce

another one, called AVL tree in the next chapter. Red-black tree is a good start for more
data structures. If extend the number of children from 2 to k, and maintain the balance,
it leads to B-tree; If store the data along with the edge but not inside node, it leads to
Radix tree. To maintain the balance, we need handle multiple cases. Okasaki’s developed
a method that makes the red-black tree easy to implement. There are many implemen-
tations based on this idea[16]. We also provide AVL tree and Splay tree implementation
based on pattern matching in this book.
88 CHAPTER 4. RED-BLACK TREE

4.7 Appendix: Example programs

Definition of red-black tree node with parent field. When not explicitly defined, the color
of the new node is red by default.
data Node<T> {
T key
Color color
Node<T> left
Node<T> right
Node<T> parent

Node(T x) = Node(null, x, null, [Link])

Node(Node<T> l, T k, Node<T> r, Color c) {

left = l, key = k, right = r, color = c
if left ̸= null then [Link] = this
if right ̸= null then [Link] = this
}

Self setLeft(l) {
left = l
if l ̸= null then [Link] = this
}

Self setRight(r) {
right = r
if r ̸= null then [Link] = this
}

Node<T> sibling() = if [Link] == this then [Link]

else [Link]

Node<T> uncle() = [Link]()

Node<T> grandparent() = [Link]

}

Insert a key to red-black tree:

Node<T> insert(Node<T> t, T key) {
root = t
x = Node(key)
parent = null
while (t ̸= null) {
parent = t
t = if (key < [Link]) then [Link] else [Link]
}
if (parent == null) { //tree is empty
root = x
} else if (key < [Link]) {
[Link](x)
} else {
[Link](x)
}
return insertFix(root, x)
}

Fix the balance:

// Fix the red→red violation
Node<T> insertFix(Node<T> t, Node<T> x) {
while ([Link] ̸= null and [Link] == [Link]) {
if ([Link]().color == [Link]) {
// case 1: ((a:R x:R b) y:B c:R) =⇒ ((a:R x:B b) y:R c:B)
Elementary Algorithms 89

[Link] = [Link]
[Link]().color = [Link]
[Link]().color = [Link]
x = [Link]()
} else {
if ([Link] == [Link]().left) {
if (x == [Link]) {
// case 2: ((a x:R b:R) y:B c) =⇒ case 3
x = [Link]
t = leftRotate(t, x)
}
// case 3: ((a:R x:R b) y:B c) =⇒ (a:R x:B (b y:R c))
[Link] = [Link]
[Link]().color = [Link]
t = rightRotate(t, [Link]())
} else {
if (x == [Link]) {
// case 2': (a x:B (b:R y:R c)) =⇒ case 3'
x = [Link]
t = rightRotate(t, x)
}
// case 3': (a x:B (b y:R c:R)) =⇒ ((a x:R b) y:B c:R)
[Link] = [Link]
[Link]().color = [Link]
t = leftRotate(t, [Link]())
}
}
}
[Link] = [Link]
return t
}
90 AVL tree
Chapter 5

AVL tree

5.1 Introduction
The idea of red-black tree is to limit the number nodes along a path within a range. AVL
tree takes a direct approach: quantify the difference between branches. For a node T ,
define:

δ(T ) = |r| − |l| (5.1)

Where |T | is the height of tree T , l and r are the left and right sub-trees. Define
δ(∅) = 0 for the empty tree. If δ(T ) = 0 for every node T , the tree is definitely balanced.
For example, a complete binary tree has n = 2h − 1 nodes for height h. There are not
any empty branches unless the leaves. The less absolute value of δ(T ), the more balanced
between the sub-trees. We call δ(T ) the balance factor of a binary tree.

5.2 Definition

2 8

1 3 6 9

5 7 10

Figure 5.1: an AVL tree

A binary search tree is an AVL tree if every sub-tree T satisfies:

|δ(T )| ≤ 1 (5.2)

There are three valid values for δ(T ): ±1, and 0. Figure 5.1 shows an AVL tree. This
definition ensures the tree height h = O(lg n), where n is the number of nodes in the tree.

91
92 CHAPTER 5. AVL TREE

Let’s prove it. For an AVL tree of height h, the number of nodes varies. There are at
most 2h − 1 nodes for a complete binary tree case. We are interesting in how many nodes
at least. Let the minimum number be N (h). We have the following result:
• Empty tree ∅: h = 0, N (0) = 0;
• Singleton tree: h = 1, N (1) = 1;
Figure 5.2 shows an AVL tree T of height h. It contains three parts, the key k, and
two sub-trees l, r. We have the following equation:

h-1 h-2

Figure 5.2: An AVL tree of height h. The height of one sub-tree is h − 1, the other is no
less than h − 2.

h = max(|l|, |r|) + 1 (5.3)

There must be a sub-tree of height h − 1. From the definition. we have ||l| − |r|| ≤ 1
holds. Hence the height of the other tree can not be lower than h − 2. The total number
of the nodes in T is the sum of both sub-trees plus 1 (for the root):
N (h) = N (h − 1) + N (h − 2) + 1 (5.4)
This recursive equation is similar to Fibonacci numbers. Actually we can transform
it to Fibonacci numbers through N ′ (h) = N (h) + 1. Equation (5.4) then changes to:
N ′ (h) = N ′ (h − 1) + N ′ (h − 2) (5.5)
Lemma 5.2.1. Let N (h) be the minimum number of nodes for an AVL tree of height h,
and N ′ (h) = N (h) + 1, then
N ′ (h) ≥ ϕh (5.6)
√
5+1
Where ϕ = is the golden ratio.
2
Proof. When h = 0 or 1, we have:
• h = 0: N ′ (0) = 1 ≥ ϕ0 = 1
• h = 1: N ′ (1) = 2 ≥ ϕ1 = 1.618...
For the induction case, assume N ′ (h) ≥ ϕh .
N ′ (h + 1) = N ′ (h) + N ′ (h − 1) {Fibonacci}
≥ ϕh + ϕh−1 √
5+3
= ϕh−1 (ϕ + 1) {ϕ + 1 = ϕ2 = }
2
= ϕh+1
5.3. INSERT 93

From Lemma 5.2.1, we immediately obtain:

h ≤ logϕ (n + 1) = logϕ 2 · lg(n + 1) ≈ 1.44 lg(n + 1) (5.7)
We prove the height of AVL tree is proportion to O(lg n), indicating AVL tree is
balanced.
When insert or delete, the balance factor may exceed the valid value range, we need fix
to resume |δ| < 1. Traditionally, the fixing is through tree rotations. We give the simplified
implementation based on pattern matching. The idea is similar to the functional red-black
tree(Okasaki, [13]). Because of this ‘modify-fix’ approach, AVL tree is also self-balanced
binary search tree. We can re-use the binary search tree definition. Although the balance
factor δ can be computed recursively, we record it inside each node as T = (l, k, r, δ), and
update it when mutate the tree1 . Below example program adds δ as an Int:
data AVLTree a = Empty
| Br (AVLTree a) a (AVLTree a) Int

For AVL tree, lookup, max, min are as same as the binary search tree. We focus on
insert and delete algorithms.

5.3 Insert
When insert a new element, |δ(T )| may exceed 1. We can use pattern matching similar to
red-black tree to develop a simplified solution. After insert element x, for those sub-trees
which are the ancestors of x, the height may increase at most by 1. We need recursively
update the balance factor along the path of insertion. Define the insert result as a pair
(T ′ , ∆H), where T ′ is the updated tree and ∆H is the increment of height. We modify
the binary search tree insert function as below:
insert = fst ◦ ins (5.8)
Where fst (a, b) = a returns the first element in a pair. ins(T, k) does the actual work
to insert element k into tree T :
ins ∅ k = ((∅, { k, ∅, 0), 1)
k < k ′ : tree (ins l k) k ′ (r, 0) δ (5.9)
ins (l, k ′ , r, δ) k =
k > k ′ : tree (l, 0) k ′ (ins r, k) δ
If the tree is empty ∅, the result is a leaf of k with balance factor 0. The height
increases to 1. Otherwise let T = (l, k ′ , r, δ). We compare the new element k with k ′ .
If k < k ′ , we recursively insert k it to the left sub-tree l, otherwise insert to r. As the
recursive insert result is a pair of (l′ , ∆l) or (r′ , ∆r), we need adjust the balance factor
and update tree height through function tree, it takes 4 parameters: (l′ , ∆l), k ′ , (r′ , ∆r),
and δ. The result is (T ′ , ∆H), where T ′ is the new tree, and ∆H is defined as:
∆H = |T ′ | − |T | (5.10)
We can further break it down into 4 cases:
∆H = |T ′ | − |T |
= 1 + max(|r′ |, |l′ |) − (1 + max(|r|, |l|))
= max(|r′ |, |l′ |) − max(|r|, |l|)


 δ ≥ 0, δ ′ ≥ 0 : ∆r (5.11)

δ ≤ 0, δ ′ ≥ 0 : δ + ∆r
=
δ ≥ 0, δ ′ ≤ 0 : ∆l − δ



otherwise : ∆l
1 Alternatively, we can record the height instead of δ[20].
94 CHAPTER 5. AVL TREE

Where δ ′ = δ(T ′ ) = |r′ | − |l′ |, is the updated balance factor. Appendix B provides the
proof for it. We need determine δ ′ before balance adjustment.

δ′ = |r′ | − |l′ |
= |r| + ∆r − (|l| + ∆l)
(5.12)
= |r| − |l| + ∆r − ∆l
= δ + ∆r − ∆l

With the changes in height and balance factor, we can define the tree function in
(5.9):

tree (l′ , ∆l) k (r′ , ∆r) δ = balance (l′ , k, r′ , δ ′ ) ∆H (5.13)

Below example programs implements what we deduced so far:

insert t x = fst $ ins t where
ins Empty = (Br Empty x Empty 0, 1)
ins (Br l k r d)
| x < k = tree (ins l) k (r, 0) d
| x > k = tree (l, 0) k (ins r) d

tree (l, dl) k (r, dr) d = balance (Br l k r d') deltaH where
d' = d + dr - dl
deltaH | d ≥ 0 && d' ≥ 0 = dr
| d ≤ 0 && d' ≥ 0 = d+dr
| d ≥ 0 && d' ≤ 0 = dl - d
| otherwise = dl

5.3.1 Balance
There are 4 cases need fix as shown in figure 5.3. The balance factor is ±2, exceeds the
range of [−1, 1]. We adjust them to a uniformed structure in the center, with the δ(y) = 0.

Figure 5.3: Fix 4 cases to the same structure

We call the 4 cases: left-left, right-right, right-left, and left-right. Denote the balance
factors before fixing as δ(x), δ(y), and δ(z); after fixing, they change to δ ′ (x), δ ′ (y) = 0,
5.3. INSERT 95

and δ ′ (z) respectively. The values of δ ′ (x) and δ ′ (z) can be given as below. Appendix B
gives the proof.
Left-left:
δ ′ (x) = δ(x)
δ ′ (y) = 0 (5.14)
δ ′ (z) = 0

Right-right:

δ ′ (x) = 0
δ ′ (y) = 0 (5.15)
δ ′ (z) = δ(z)

Right-left and Left-right:

{
′ δ(y) = 1 : −1
δ (x) =
otherwise : 0
δ ′ (y) = {
0 (5.16)
δ(y) = −1 : 1
δ ′ (z) =
otherwise : 0

Based on this, we can implement the pattern matching fix as below:

balance (((a, x, b, δ(x)), y, c, −1), z, d, −2) ∆H = ((a, x, b, δ(x)), y, (c, z, d, 0), 0, ∆H − 1)
balance (a, x, (b, y, (c, z, d, δ(z)), 1), 2) ∆H = ((a, x, b, 0), y, (c, z, d, δ(z)), 0, ∆H − 1)
balance ((a, x, (b, y, c, δ(y)), 1), z, d, −2) ∆H = ((a, x, b, δ ′ (x)), y, (c, z, d, δ ′ (z)), 0, ∆H − 1)
balance (a, x, ((b, y, c, δ(y)), z, d, −1), 2) ∆H = ((a, x, b, δ ′ (x)), y, (c, z, d, δ ′ (z)), 0, ∆H − 1)
balance T ∆H = (T, ∆H)
(5.17)
Where δ ′ (x) and δ ′ (z) are defined in (B.17). If none of the pattern matches, the last
row keeps the tree unchanged. Below is the example program implements balance:
balance (Br (Br (Br a x b dx) y c (-1)) z d (-2)) dH =
(Br (Br a x b dx) y (Br c z d 0) 0, dH-1)
balance (Br a x (Br b y (Br c z d dz) 1) 2) dH =
(Br (Br a x b 0) y (Br c z d dz) 0, dH-1)
balance (Br (Br a x (Br b y c dy) 1) z d (-2)) dH =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1)) 2) dH =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance t d = (t, d)

The performance of insert is proportion to the height of the tree. From (5.7), it is
bound to is O(lg n) where n is the number of elements in the tree.

Verification
To test an AVL tree, we need verify two things: It is a binary search tree; and for every
sub-tree T , equation (5.2): δ(T ) ≤ 1 holds. Below function examines the height difference
between the two sub-trees recursively:

avl? ∅ = True
(5.18)
avl? T = avl? l ∧ avl? r ∧ ||r| − |l|| ≤ 1
96 CHAPTER 5. AVL TREE

Where l, r are the left and right sub-trees. The height is calculated recursively:

|∅| = 0
(5.19)
|T | = 1 + max(|r|, |l|)

Below example program implements AVL tree height verification:

isAVL Empty = True
isAVL (Br l _ r _) = isAVL l && isAVL r && abs (height r - height l) ≤ 1

height Empty = 0
height (Br l _ r _) = 1 + max (height l) (height r)

Exercise 5.1
1. We only give the algorithm to test AVL height. Complete the program to test if a
binary tree is AVL tree.

5.4 Imperative AVL tree algorithm ⋆

This section gives the imperative algorithm for completeness. Similar to the red-black
tree algorithm, we first re-use the binary search tree insert, then fix the balance through
tree rotations.
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: δ(x) ← 0
5: parent ← NIL
6: while T 6= NIL do
7: parent ← T
8: if k < Key(T ) then
9: T ← Left(T )
10: else
11: T ← Right(T )
12: Parent(x) ← parent
13: if parent = NIL then ▷ tree T is empty
14: return x
15: else if k < Key(parent) then
16: Left(parent) ← x
17: else
18: Right(parent) ← x
19: return AVL-Insert-Fix(root, x)
After insert, the balance factor δ may change because of the tree growth. Insert to the
right may increase δ by 1, while insert to the left may decrease it. We perform bottom-up
fixing from x to root. Denote the new balance factor as δ ′ , there are 3 cases:

• |δ| = 1, |δ ′ | = 0. The new node makes the tree well balanced. The height of the
parent keeps unchanged.
• |δ| = 0, |δ ′ | = 1. Either the left or the right sub-tree increases its height. We need
go on checking the upper level.
• |δ| = 1, |δ ′ | = 2. We need rotate the tree to fix the balance factor.
5.4. IMPERATIVE AVL TREE ALGORITHM ⋆ 97

1: function AVL-Insert-Fix(T, x)
2: while Parent(x) 6= NIL do
3: P ← Parent(x)
4: L ← Left(x)
5: R ← Right(x)
6: δ ← δ(P )
7: if x = Left(P ) then
8: δ′ ← δ − 1
9: else
10: δ′ ← δ + 1
11: δ(P ) ← δ ′
12: if |δ| = 1 and |δ ′ | = 0 then ▷ Height unchanged
13: return T
14: else if |δ| = 0 and |δ ′ | = 1 then ▷ Go on bottom-up update
15: x←P
16: else if |δ| = 1 and |δ ′ | = 2 then
17: if δ ′ = 2 then
18: if δ(R) = 1 then ▷ Right-right
19: δ(P ) ← 0 ▷ By (B.6)
20: δ(R) ← 0
21: T ← Left-Rotate(T, P )
22: if δ(R) = −1 then ▷ Right-left
23: δy ← δ(Left(R)) ▷ By (B.17)
24: if δy = 1 then
25: δ(P ) ← −1
26: else
27: δ(P ) ← 0
28: δ(Left(R)) ← 0
29: if δy = −1 then
30: δ(R) ← 1
31: else
32: δ(R) ← 0
33: T ← Right-Rotate(T, R)
34: T ← Left-Rotate(T, P )
′
35: if δ = −2 then
36: if δ(L) = −1 then ▷ Left-left
37: δ(P ) ← 0
38: δ(L) ← 0
39: Right-Rotate(T, P )
40: else ▷ Left-Right
41: δy ← δ(Right(L))
42: if δy = 1 then
43: δ(L) ← −1
44: else
45: δ(L) ← 0
46: δ(Right(L)) ← 0
47: if δy = −1 then
48: δ(P ) ← 1
49: else
50: δ(P ) ← 0
51: Left-Rotate(T, L)
98 CHAPTER 5. AVL TREE

52: Right-Rotate(T, P )
53: break
54: return T
Besides rotation, we also need update δ for the impacted nodes. The right-right and
left-left cases need one rotation, while the right-left and left-right case need two rotations.
We skip the AVL tree delete algorithm in this chapter. Appendix B provides the delete
implementation.

5.5 Summary
AVL tree was developed in 1962 by Adelson-Velskii and Landis[18], [19]. It is named after
the two authors. AVL tree was developed earlier than the red-black tree. Both are self-
balance binary search trees. Most tree operations are bound O(lg n) time. From (5.7),
AVL tree is more rigidly balanced, and performs faster than red-black tree in looking up
intensive applications [18]. However, red-black tree performs better in frequently insertion
and removal cases. Many popular self-balance binary search tree libraries are implemented
on top of red-black tree. AVL tree also provides the intuitive and effective solution to the
balance problem.

5.6 Appendix: Example programs

Definition of AVL tree node.
data Node<T> {
int delta
T key
Node<T> left
Node<T> right
Node<T> parent
}

Fix the balance:

Node<T> insertFix(Node<T> t, Node<T> x) {
while ([Link] ̸= null ) {
var (p, l, r) = ([Link], [Link], [Link])
var d1 = [Link]
var d2 = if x == [Link] then d1 - 1 else d1 + 1
[Link] = d2

if abs(d1) == 1 and abs(d2) == 0 {

return t
} else if abs(d1) == 0 and abs(d2) == 1 {
x = p
} else if abs(d1) == 1 and abs(d2) == 2 {
if d2 == 2 {
if [Link] == 1 { //Right-right
[Link] = 0
[Link] = 0
t = rotateLeft(t, p)
} else if [Link] == -1 { //Right-Left
var dy = [Link]
[Link] = if dy == 1 then -1 else 0
[Link] = 0
[Link] = if dy == -1 then 1 else 0
t = rotateRight(t, r)
t = rotateLeft(t, p)
}
Elementary algorithms 99

} else if d2 == -2 {
if [Link] == -1 { //Left-left
[Link] = 0
[Link] = 0
t = rotateRight(t, p)
} else if [Link] == 1 { //Left-right
var dy = [Link]
[Link] = if dy == 1 then -1 else 0
[Link] = 0
[Link] = if dy == -1 then 1 else 0
t = rotateLeft(t, l)
t = rotateRight(t, p)
}
}
break
}
}
return t
}
100 Radix tree
Chapter 6

Radix tree

Binary search tree stores data in nodes. Can we use the edges to carry information? Radix
trees, including trie, prefix tree, and suﬀix tree are the data structures developed based
on this idea in 1960s. They are widely used in compiler design[21], and bio-information
processing, like DNA pattern matching [23].

0 1

1 0

1 0 1

011 100

1011

Figure 6.1: Radix tree.

Figure 6.1 shows a Radix tree. It contains bits 1011, 10, 011, 100 and 0. When lookup
a key k = (b0 b1 ...bn )2 , we take the first bit b0 (MSB from left), check whether it is 0 or 1.
For 0, turn left, else turn right. Then take the second bit and repeat looking up till either
reach a leaf node or consume all the n bits. We needn’t store keys in Radix tree node.
The information is represented by edges. The nodes labelled with key in figure 6.1 are
for illustration purpose. If the keys are integers, we can represent them in binary format,
and implement lookup with bit-wise manipulations.

6.1 Integer trie

We call the data structure in figure 6.1 binary trie. Trie was developed by Edward
Fredkin in 1960. It comes from “retrieval”, pronounce as /’tri:/ by Freddkin, while
others pronounce it as /’trai/ “try”[24]. Although it’s also called prefix tree in some
context, we treat trie and prefix tree different in this chapter. A binary trie is a special

101
102 CHAPTER 6. RADIX TREE

binary tree in which the placement of each key is controlled by its bits, each 0 means ‘go
left’ and each 1 means ‘go right’[21]. Consider the binary trie in figure 6.2. The three
keys are different bit strings of “11”, “011”, and “0011” although they are all equal to 3.

0 1

0 1 1

1 1

011

0011

Figure 6.2: A big-endian trie.

It is ineﬀicient to treat the prefix zeros as valid bits. For 32 bits integers, we need
a tree of 32 levels to insert number 1. Okasaki suggested to use little-endian integers
instead[21]. 1 is represented as bits (1)2 , 2 as (01)2 , and 3 is (11)2 , ...

6.1.1 Definition
We can re-use binary tree structure to define the little-endian binary trie. A node is either
empty, or a branch containing the left, right sub-trees, and an optional value. The left
sub-tree is encoded as 0 and the right sub-tree is encoded as 1.
data IntTrie a = Empty
| Branch (IntTrie a) (Maybe a) (IntTrie a)

Given a node in the binary trie, the integer key bound to it is uniquely determined
through its position. That is the reason we need not save the key, but only the value in
the node. The type of the key is always integer, we call the tree IntT rie A if the value is
of type A.

6.1.2 Insert
When insert an integer key k and a value v, we convert k into binary form. If k is even,
the lowest bit is 0, we recursively insert to the left sub-tree; otherwise if k is odd, the
lowest bit is 1, we recursively insert to the right. We next divide k by 2 to remove the
lowest bit. For none empty trie T = (l, v ′ , r), where l, r are the left and right sub-trees,
and v ′ is the optional value, function insert can be defined as below:

insert ∅ k v = insert(∅, Nothing, ∅) k v

insert (l, v ′ , r) 0 v  Just v, r)
= (l,

even(k) : (insert l k v, v ′ , r) (6.1)
insert (l, v ′ , r) k v = 2

odd(k) :
k
(l, v ′ , insert r b c v)
2
6.1. INTEGER TRIE 103

If k = 0, we put v in the node. When T = ∅, it becomes (∅, Just v, ∅). As far

as k 6= 0, we goes down along the tree based on the parity of k. We create empty leaf
(∅, Nothing, ∅) whenever meet ∅ node. This algorithm overrides the value if k already
exists. Alternatively, we can store a list, and append v to it. Figure 6.3 shows an example
trie, generated by inserting the key-value pairs of {1 → a, 4 → b, 5 → c, 9 → d}. Below is
the example program implements insert:

0 1

1:a

0 0

1 0 1

4:b 5:c

9:d

Figure 6.3: A little-endian integer binary trie of {1 → a, 4 → b, 5 → c, 9 → d}.

insert Empty k x = insert (Branch Empty Nothing Empty) k x

insert (Branch l v r) 0 x = Branch l (Just x) r
insert (Branch l v r) k x | even k = Branch (insert l (k `div` 2) x) v r
| otherwise = Branch l v (insert r (k `div` 2) x)

We can define the even/odd testing by modular 2, and check if the remainder is 0
or not: even(k) = (k mod 2 = 0). Or use bit-wise operation in some environment, like
(k & 0x1) == 0. We can eliminate the recursion through loops to realize an iterative
implementation as below:
1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node ▷ (NIL, Nothing, NIL)
4: p←T
5: while k 6= 0 do
6: if Even?(k) then
7: if Left(p) = NIL then
8: Left(p) ← Empty-Node
9: p ← Left(p)
10: else
11: if Right(p) = NIL then
12: Right(p) ← Empty-Node
13: p ← Right(p)
14: k ← bk/2c
15: Value(p) ← v
16: return T
104 CHAPTER 6. RADIX TREE

Insert takes, a trie T , a key k, and a value v. For integer k with m bits in binary, it
goes into m levels of the trie. The performance is bound to O(m).

6.1.3 Look up
When look up key k in a none empty integer trie, if k = 0, then the root node is the
target. Otherwise, we check the lowest bit, then recursively look up the left or right
sub-tree accordingly.

lookup ∅ k = Nothing
lookup (l, v, r) 0 = v

even(k) : k (6.2)
lookup l
lookup (l, v, r) k = 2

odd(k) :
k
lookup r b c
2
Below example program implements the lookup function:
lookup Empty _ = Nothing
lookup (Branch _ v _) 0 = v
lookup (Branch l _ r) k | even k = lookup l (k `div` 2)
| otherwise = lookup r (k `div` 2)

We can eliminate the recursion to implement the iterative lookup as the following:
1: function Lookup(T, k)
2: while k 6= 0 and T 6=NIL do
3: if Even?(k) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: k ← bk/2c
8: if T 6= NIL then
9: return Value(T )
10: else
11: return NIL
The lookup function is bound to O(m) time, where m is the number of bits of k.

Exercise 6.1
1. Can we change the definition from Branch (IntTrie a) (Maybe a) (IntTrie
a) to Branch (IntTrie a) a (IntTrie a), and return Nothing if the value
does not exist, and Just v otherwise?

6.2 Integer prefix tree

Trie is not space eﬀicient. As shown in figure 6.3, there are only 4 nodes with value,
while the rest 5 are empty. The space usage is less than 50%. To improve the eﬀiciency,
we can consolidate the chained nodes to one. Integer prefix tree is such a data struc-
ture developed by Donald R. Morrison in 1968. He named it as ‘Patricia’, standing for
Practical Algorithm To Retrieve Information Coded In Alphanumeric[22]. When the
keys are integer, we call it integer prefix tree or simply integer tree when the context is
clear. Okasaki provided the implementation in [21]. Consolidate the chained nodes in
figure 6.3, we obtained an integer tree as shown in figure 6.4.
6.2. INTEGER PREFIX TREE 105

001 1

4:b 1:a

01 1

9:d 5:c

Figure 6.4: Little endian integer tree for the map {1 → a, 4 → b, 5 → c, 9 → d}.

The key to the branch node is the longest common prefix for its descendant trees. In
other words, the sibling sub-trees branch out at the bit where ends at their longest prefix.
As the result, integer tree eliminates the redundant spaces in trie.

6.2.1 Definition
Integer prefix tree is a special binary tree. It is either empty or a node of:

• A leaf contains an integer key k and a value v;

• Or a branch with the left and right sub-trees, that share the longest common
prefix bits for their keys. For the left sub-tree, the next bit is 0, for the right, it is
1.

Below example program defines the integer prefix tree. The branch node contains 4
components: The longest prefix, a mask integer indicating from which bit the sub-trees
branch out, the left and right sub-trees. The mask is m = 2n for some integer n ≥ 0. All
bits that are lower than n do not belong to the common prefix.
data IntTree a = Empty
| Leaf Int a
| Branch Int Int (IntTree a) (IntTree a)

6.2.2 Insert
When insert integer y to tree T , if T is empty, we create a leaf of y; If T is a singleton
leaf of x, besides the new leaf of y, we need create a branch node, set x and y as the
two sub-trees. To determine whether y is on the left or right, we need find the longest
common prefix p of x and y. For example if x = 12 = (1100)2 , y = 15 = (1111)2 , then
p = (11oo)2 , where o denotes the bits we don’t care. We can use another integer m to
mask those bits. In this example, m = 4 = (100)2 . The next bit after p presents 21 . It is
0 in x, 1 in y. Hence, we set x as the left sub-tree and y as the right, as shown in figure
6.5.
If T is neither empty nor a leaf, we firstly check if y matches the longest common
prefix p in the root, then recursively insert it to the sub-tree according to the next bit
after p. For example, when insert y = 14 = (1110)2 to the tree shown in figure 6.5, since
p = (11oo)2 and the next bit (the bit of 21 ) is 1, we recursively insert y to the right
106 CHAPTER 6. RADIX TREE

prefix=1100
12
mask=100

0 1

12 15

Figure 6.5: Left: T is a leaf of 12; Right: After insert 15.

sub-tree. If y does not match p in the root, we need branch a new leaf as shown in figure
6.6.

prefix=1100 prefix=1100
mask=100 mask=100

0 1 0 1

prefix=1110
12 15 12
mask=10

0 1

14 15

(a) Insert 14 = (1110)2 , which matches

p = (1100)2 . It is inserted to the right.
prefix=1100 prefix=0
mask=100 mask=10000

0 1 0 1

prefix=1110
12 15 5
mask=10

0 1

12 15

(b) Insert 5 = (101)2 , which does not match

p = (1100)2 . Branch out a new leaf.

Figure 6.6: The tree is a branch node.

For integer key k and value v, let (k, v) be the leaf. For branch node, denote it as
(p, m, l, r), where p is the longest common prefix, m is the mask, l and r are the left and
6.2. INTEGER PREFIX TREE 107

right sub-trees. Below insert function defines the above 3 cases:

insert ∅ k v =
(k, v)
insert (k, v ′ ) k v =
(k, v)
insert (k ′ , v ′ ) k v join k (k, v) k ′ (k ′ , v{
=

′
)

match(k, p, m) : zero(k, m) : (p, m, insert l k v)
insert (p, m, l, r) k v = otherwise : (p, m, insert r k v)


otherwise : join k (k, v) p (p, m, l, r)
(6.3)
The first clause creates a leaf when T = ∅; the second clause overrides the value for
the same key. Function match(k, p, m) tests if integer k and prefix p have the same bits
after masked with m through: mask(k, m) = p, where mask(k, m) = m − 1&k. It applies
bit-wise not to m − 1, then does bit-wise and with k. zero(k, m) tests the next bit in k
masked with m is 0 or not. We shift m one bit to right, then do bit-wise and with k:

zero(k, m) = x&(m >> 1) (6.4)

Function join(p1 , T1 , p2 , T2 ) takes two different prefixes and trees. It extracts the
longest common prefix of p1 and p2 as (p, m) = LCP (p1 , p2 ), creates a new branch node,
then set T1 and T2 as the two sub-trees:
{
zero(p1, m) : (p, m, T1 , T2 )
join(p1 , T1 , p2 , T2 ) = (6.5)
otherwise : (p, m, T2 , T1 )

To calculate the longest common prefix, we can firstly compute bit-wise exclusive-or
for p1 and p2, then count the highest bit highest(xor(p1 , p2 )) as:

highest(0) = 0
highest(n) = 1 + highest(n >> 1)

Then generate a mask m = 2highest(xor(p1 ,p2 )) . The longest common prefix p can be
given by masking the bits with m for either p1 or p2 , like p = mask(p1 , m). The following
example program implements the insert function:
insert t k x
= case t of
Empty → Leaf k x
Leaf k' x' → if k == k' then Leaf k x
else join k (Leaf k x) k' t
Branch p m l r
| match k p m → if zero k m
then Branch p m (insert l k x) r
else Branch p m l (insert r k x)
| otherwise → join k (Leaf k x) p t

join p1 t1 p2 t2 = if zero p1 m then Branch p m t1 t2

else Branch p m t2 t1
where
(p, m) = lcp p1 p2

lcp p1 p2 = (p, m) where

m = bit (highestBit (p1 `xor` p2))
p = mask p1 m

highestBit x = if x == 0 then 0 else 1 + highestBit (shiftR x 1)

mask x m = x .&. complement (m - 1)

108 CHAPTER 6. RADIX TREE

zero x m = x .&. (shiftR m 1) == 0

match k p m = (mask k m) == p

We can also implement insert imperatively:

1: function Insert(T, k, v)
2: if T = NIL then
3: return Create-Leaf(k, v)
4: y←T
5: p ← NIL
6: while y is not leaf, and Match(k, Prefix(y), Mask(y)) do
7: p←y
8: if Zero?(k, Mask(y)) then
9: y ← Left(y)
10: else
11: y ← Right(y)
12: if y is leaf, and k = Key(y) then
13: Value(y) ← v
14: else
15: z ← Branch(y, Create-Leaf(k, v))
16: if p = NIL then
17: T ←z
18: else
19: if Left(p) = y then
20: Left(p) ← z
21: else
22: Right(p) ← z
23: return T
Where Branch(T1 , T2 ) creates a new branch node, extracts the longest common pre-
fix, then sets T1 and T2 as the two sub-trees.
1: function Branch(T1 , T2 )
2: T ← Empty-Node
3: (Prefix(T ), Mask(T )) ← LCP(Prefix(T1 ), Prefix(T2 ))
4: if Zero?(Prefix(T1 ), Mask(T )) then
5: Left(T ) ← T1
6: Right(T ) ← T2
7: else
8: Left(T ) ← T2
9: Right(T ) ← T1
10: return T

11: function Zero?(x, m)

m
12: return (x&b c) = 0
2
Function LCP find the longest bit prefix from two integers:
1: function LCP(a, b)
2: d ← xor(a, b)
3: m←1
4: while d 6= 0 do
d
5: d←b c
2
6.2. INTEGER PREFIX TREE 109

6: m ← 2m
7: return (MaskBit(a, m), m)

8: function MaskBit(x, m)
9: return x&m − 1

Figure 6.7 gives an example integer tree created from the insert algorithm. Although
integer prefix tree consolidates the chained nodes, the operation to extract the longest
common prefix need linear scan the bits. For integer of m bits, the insert is bound to
O(m).

prefix=0
mask=8

0 1

prefix=100
1:x
mask=2

0 1

4:y 5:z

Figure 6.7: Insert {1 → x, 4 → y, 5 → z} to the big-endian integer tree.

6.2.3 Lookup
When lookup key k, if the integer tree T = ∅ or it is a leaf of T = (k ′ , v) with different
key, then k does not exist; if k = k ′ , then v is the result; if T = (p, m, l, r) is a branch
node, we need check if the common prefix p matches k under the mask m, then recursively
lookup the sub-tree l or r upon next bit. If fails to match the common prefix p, then k
does not exist.
lookup ∅ k = Nothing
{
k = k′ : Just v
lookup (k ′ , v) k =
otherwise : Nothing
 { (6.6)

match(k, p, m) : zero(k, m) : lookup l k
lookup (p, m, l, r) k = otherwise : lookup r k


otherwise : Nothing

We can also eliminate the recursion to implement the iterative lookup algorithm.
1: function Look-Up(T, k)
2: if T = NIL then
3: return NIL
4: while T is not leaf, and Match(k, Prefix(T ), Mask(T )) do
5: if Zero?(k, Mask(T )) then
6: T ← Left(T )
7: else
8: T ← Right(T )
110 CHAPTER 6. RADIX TREE

9: if T is leaf, and Key(T ) = k then

10: return Value(T )
11: else
12: return NIL
The lookup algorithm is bound to O(m), where m is the number of bits in the key.

Exercise 6.2
1. Write a program to implement the lookup function.
2. Implement the pre-order traverse for both integer trie and integer tree. Only
output the keys when the nodes store values. What pattern does the result follow?

6.3 Trie
From integer trie and tree, we can extend the key to a list of elements. Particularly the
trie and tree with key in alphabetic string are powerful tools for text manipulation.

6.3.1 Definition
When extend the key type from 0/1 bits to generic list, the tree structure changes from
binary tree to multiple sub-trees. Taking English characters for example, there are up to
26 sub-trees when ignore the case as shown in figure 6.8.
Not all the 26 sub-trees contain data. In figure 6.8, there are only three none empty
sub-trees bound to ‘a’, ‘b’, and ‘z’. Other sub-trees, such as for ‘c’, are empty. We can
hide them in the figure. When it is case sensitive, or extent the key from alphabetic string
to generic list, we can adopt collection types, like map to define trie.
A trie is either empty or a node of 2 kinds:

1. A leaf of value v without any sub-trees;

2. A branch, containing a value v and multiple sub-trees. Each sub-tree is bound to

an element k of type K.

Let the type of value be V , we denote the trie as T rie K V . Below example program
defines trie.
data Trie k v = Trie { value :: Maybe v
, subTrees :: [(k, Trie k v)]}

The empty trie is in form of (Nothing, ∅).

6.3.2 Insert
When insert a pair of key and value to the trie, where the key is a list of elements.
Let the trie be T = (v, ts), where v is the value stored in the trie, and ts = {c1 7→
T1 , c2 7→ T2 , ..., cm 7→ Tm } contains mappings between elements and sub-trees. Element
ci is mapped to sub-tree Ti . We can either implement the mapping through associated
list: [(c1 , T1 ), (c2 , T2 ), ..., (cm , Tm )], or through self-balanced tree map (Chapter 4 or 5).

insert (v, ts) ∅ v ′ = (v ′ , ts)

(6.7)
insert (v, ts) (k : ks) v ′ = (v, ins ts)
6.3. TRIE 111

a b c z

a nil ...

n o o

o o y o

boy zoo

t l

bool

another

Figure 6.8: A trie of 26 branches, containing key ‘a’, ‘an’, ‘another’, ‘bool’, ‘boy’, and
‘zoo’.
112 CHAPTER 6. RADIX TREE

When the key is empty, we override the value; otherwise, we extract the first element
k, check if there is a map among the sub-trees for k, and recursively insert ks and v ′ :
′
ins ∅ = { 7→ insert (Nothing, ∅) ks v ]
[k
c=k: (k 7→ insert t ks v ′ ) : ts (6.8)
ins ((c 7→ t) : ts) =
otherwise : (c 7→ t) : (ins ts)

If there is no sub-tree in the node, we create a mapping from k to an empty trie node
t = (Nothing, ∅); otherwise, we located the sub-tree t mapped to k, then recursively insert
ks and v ′ to t. Below is the example program implement insert, it’s based on associated
list to manage sub-tree mappings.
insert (Trie _ ts) [] x = Trie (Just x) ts
insert (Trie v ts) (k:ks) x = Trie v (ins ts) where
ins [] = [(k, insert empty ks x)]
ins ((c, t) : ts) = if c == k then (k, insert t ks x) : ts
else (c, t) : (ins ts)

empty = Trie Nothing []

We can also eliminate the recursion to implement insert iteratively.

1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node
4: p←T
5: for each c in k do
6: if Sub-Trees(p)[c] = NIL then
7: Sub-Trees(p)[c] ← Empty-Node
8: p ← Sub-Trees(p)[c]
9: Value(p) ← v
10: return T
For the key type [K] (list of K), if K is finite set of m elements, and the length of the
key is n, then the insert algorithm is bound to O(mn). When the key is lower case English
strings, then m = 26, the insert operation is proportion to the length of key string.

6.3.3 Look up
When look up a none empty key (k : ks) from trie T = (v, ts), starting from the first
element k, if there exists sub-tree T ′ mapped to k, we then recursively lookup ks in T ′ .
When the key is empty, then return the value as result:

lookup ∅ (v, ts) = v{

lookupl k ts = Nothing : Nothing (6.9)
lookup (k : ks) (v, ts) =
lookupl k ts = Just t : lookup ks t

Where function lookupl is defined in chapter 1. It looks up if a key exits in an assoc

list. Below is the corresponding iterative implementation:
1: function Look-Up(T, key)
2: if T = NIL then
3: return Nothing
4: for each c in key do
5: if Sub-Trees(T )[c] = NIL then
6: return Nothing
6.4. PREFIX TREE 113

7: T ← Sub-Trees(T )[c]
8: return Value(T )
The lookup algorithm is bound to O(mn), where n is the length of the key, and m is
the size of the element set.

Exercise 6.3
1. Use the self-balance binary tree, like red-black tree or AVL tree to implement a map
data structure, and manage the sub-trees with map. We call such implementation
M apT rie and M apT ree respectively. What are the performance of insert and
lookup for map based tree and trie?

6.4 Prefix tree

Trie is not space eﬀicient. We can consolidate the chained nodes to obtain the prefix tree.

6.4.1 Definition
A prefix tree node t contains two parts: an optional value v; zero or multiple sub prefix
trees, each ti is bound to a list si . The sub-trees and their mappings are denoted as
[si 7→ ti ]. These lists share the longest common prefix s bound to the node t. i.e. s is
the longest common prefix of s + + s2 , ... For any i 6= j, list si and sj don’t have
+ s1 , s +
none empty common prefix. Consolidate the chained nodes in figure 6.8, we obtain the
corresponding prefix tree in figure 6.9.

a bo zoo

a zoo

n ol y

an bool boy

other

another

Figure 6.9: A prefix tree with keys: ‘a’, ‘an’, ‘another’, ‘bool’, ‘boy’, ‘zoo’.

Below example program defines the prefix tree:

data PrefixTree k v = PrefixTree { value :: Maybe v
, subTrees :: [([k], PrefixTree k v)]}

We denote prefix tree t = (v, ts). Particularly, (Nothing, ∅) is the empty node, and
(Just v, ∅) is a leaf node of value v.

6.4.2 Insert
When insert key s, if the prefix tree is empty, we create a leaf node of s as figure 6.10
(a); otherwise, if there exits common prefix between s and si , where si is bound to some
114 CHAPTER 6. RADIX TREE

sub-tree ti , we branch out a new leaf tj , extract the common prefix, and map it to a new
internal branch node t′ , then put ti and tj as two sub-trees of t′ . Figure 6.10 (b) shows
this case. There are two special cases: s is the prefix of si as shown in figure 6.10 (c) →
(e); or si is the prefix of s as shown in figure 6.10 (d) → (e).

Figure 6.10: (a) insert ‘boy’ to empty tree; (b) insert ‘bool’, branch a new node out; (c)
insert ‘another’ to (b); (d) insert ‘an’ to (b); (e) insert ‘an’ to (c), same result as insert
‘another’ to (d)

Below function inserts key s and value v to the prefix tree t = (v ′ , ts):

insert (v ′ , ts) ∅ v = (Just v, ts)

(6.10)
insert (v ′ , ts) s v = (v ′ , ins ts)

If the key s is empty, we overwrite the value to v; otherwise, we call ins to examine
the sub-trees and their prefixes.

ins ∅ { 7→ (Just v, ∅)]

= [s
match s s′ : (branch s v s′ t) : ts′ (6.11)
ins (s′ 7→ t) : ts′ =
otherwise : (s′ 7→ t) : ins ts′

If there is no sub-tree in the node, then we create a leaf of v as the single sub-tree, and
map s to it; otherwise, for each sub-tree mapping s′ 7→ t, we compare s′ with s. If they
have common prefix (tested by the match function), then we branch out new sub-tree.
We define two lists matching if they have common prefix:

match ∅ B = T rue
match A ∅ = T rue (6.12)
match (a : as) (b : bs) = a=b

To extract the longest common prefix of two lists A and B, we define a function
(C, A′ , B ′ ) = lcp A B, where C ++ A′ = A and C + + B ′ = B hold. If either A or B is
empty, or their first elements are different, then the common prefix C = ∅; otherwise, we
6.4. PREFIX TREE 115

recursively extract the longest common prefix from the rest lists, and preprend the head
element:
lcp ∅ B = (∅, ∅, B)
lcp A ∅ = (∅,
{ A, ∅)
(6.13)
a 6= b : (∅, a : as, b : bs)
lcp (a : as) (b : bs) =
otherwise : (a : cs, as′ , bs′ )

where (cs, as′ , bs′ ) = lcp as bs in the recursive case. Function branch A v B t takes
two keys A, B, a value v, and a tree t. It extracts the longest common prefix C from A
and B, maps it to a new branch node, and assign sub-trees:

branch A v 
B t=
 ′ ′
(C, ∅, B ) : (C, (Just v, [B 7→ t]))
′ ′ (6.14)
lcp A B = (C, A , ∅) : (C, insert t A v)


(C, A′ , B ′ ) : (C, (Nothing, [A′ 7→ (Just v, ∅), B ′ 7→ t]))

If A is the prefix of B, then A is mapped to the node of v, and the remaining list is
re-mapped to t, which is the single sub-tree in the branch; if B is the prefix of A, then we
recursively insert the remaining list and the value to t; otherwise, we create a leaf node
of v put it together with t as the two sub-trees of the branch. The following example
program implements the insert algorithm:
insert (PrefixTree _ ts) [] v = PrefixTree (Just v) ts
insert (PrefixTree v' ts) k v = PrefixTree v' (ins ts) where
ins [] = [(k, leaf v)]
ins ((k', t) : ts) | match k k' = (branch k v k' t) : ts
| otherwise = (k', t) : ins ts

leaf v = PrefixTree (Just v) []

match [] _ = True
match _ [] = True
match (a:_) (b:_) = a == b

branch a v b t = case lcp a b of

(c, [], b') → (c, PrefixTree (Just v) [(b', t)])
(c, a', []) → (c, insert t a' v)
(c, a', b') → (c, PrefixTree Nothing [(a', leaf v), (b', t)])

lcp [] bs = ([], [], bs)

lcp as [] = ([], as, [])
lcp (a:as) (b:bs) | a ̸= b = ([], a:as, b:bs)
| otherwise = (a:cs, as', bs') where
(cs, as', bs') = lcp as bs

We can eliminate the recursion to implement the insert algorithm in loops.

1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node
4: p←T
5: loop
6: match ← FALSE
7: for each si 7→ Ti in Sub-Trees(p) do
8: if k = si then
9: Value(Ti ) ← v ▷ Overwrite
10: return T
116 CHAPTER 6. RADIX TREE

11: c ← LCP(k, si )
12: k1 ← k − c, k2 ← si − c
13: if c 6= NIL then
14: match ← TRUE
15: if k2 = NIL then ▷ si is prefix of k
16: p ← Ti , k ← k1
17: break
18: else ▷ Branch out a new leaf
19: Add(Sub-Trees(p), c 7→ Branch(k1 , Leaf(v), k2 , Ti ))
20: Delete(Sub-Trees(p), si 7→ Ti )
21: return T
22: if not match then ▷ Add a new leaf
23: Add(Sub-Trees(p), k 7→ Leaf(v))
24: break
25: return T
Function LCP extracts the longest common prefix from two lists.
1: function LCP(A, B)
2: i←1
3: while i ≤ |A| and i ≤ |B| and A[i] = B[i] do
4: i←i+1
5: return A[1...i − 1]
There is a special case in Branch(s1 , T1 , s2 , T2 ). If s1 is empty, the key to be insert
is some prefix. We set T2 as the sub-tree of T1 . Otherwise, we create a new branch node
and set T1 and T2 as the two sub-trees.
1: function Branch(s1 , T1 , s2 , T2 )
2: if s1 = NIL then
3: Add(Sub-Trees(T1 ), s2 7→ T2 )
4: return T1
5: T ← Empty-Node
6: Sub-Trees(T ) ← {s1 7→ T1 , s2 7→ T2 }
7: return T
Although the prefix tree improves the space eﬀiciency of trie, it is still bound to O(mn),
where n is the length of the key, and m is the size of the element set.

6.4.3 Look up
When look up a key k, we start from the root. If k = ∅ is empty, then return the root
value as the result; otherwise, we examine the sub-tree mappings, locate the one si 7→ ti ,
such that si is some prefix of k, then recursively look up k − si in sub-tree ti . If there
does not exist si as the prefix of k, then there is no such key in the prefix tree.

lookup ∅ (v, ts) = v

lookup k (v, ts) = f{ind ((s, t) 7→ s v k) ts =
(6.15)
Nothing : N othing
Just (s, t) : lookup (k − s) t

Where A v B means list A is prefix of B. Function f ind is defined in chapter 1, which

searches element in a list with a given predication. Below example program implements
the look up algorithm.
lookup [] (PrefixTree v _) = v
lookup ks (PrefixTree v ts) =
6.5. APPLICATIONS OF TRIE AND PREFIX TREE 117

case find (λ(s, t) → s `isPrefixOf` ks) ts of

Nothing → Nothing
Just (s, t) → lookup (drop (length s) ks) t

The prefix testing is linear to the length of the list, the lookup algorithm is bound to
O(mn) time, where m is the size of the element set, and n is the length of the key. We
skip the imperative implementation, and leave it as the exercise.

Exercise 6.4
1. Eliminate the recursion to implement the prefix tree lookup purely with loops

6.5 Applications of trie and prefix tree

We can use trie and prefix tree to solve many interesting problems, like implement a
dictionary, populate candidate inputs, and realize the textonym input method. Different
from the industry implementation, we give the examples to illustrate the ideas of trie and
prefix tree.

6.5.1 Dictionary and input completion

As shown in figure 6.11, when user enters some characters, the dictionary application
searches the library, populates a list of candidate words or phrases that start from what
input.

Figure 6.11: A dictionary application

A dictionary can contain hundreds of thousands words. It’s expensive to perform a

complete search. Commercial dictionaries adopt varies engineering approach, like caching,
indexing to speed up search. Similarly, figure 6.12 shows a smart text input component.
When type some characters, it populates a candidate lists, with all items starting with
the input string.
Both examples give the ‘auto-completion’ functionality. We can implement it with
prefix tree. For illustration purpose, we limit to English characters, and set a upper
bound n for the number of candidates. A dictionary stores key-value pairs, where the
key is English word or phrase, the value is the corresponding meaning and explanation.
When user input string s, we look up the prefix tree for all keys start with s. If s is empty
118 CHAPTER 6. RADIX TREE

Figure 6.12: A smart text input component

we expand all sub-trees till reach to n candidates; otherwise, we locate the sub-tree from
the mapped key, and look up recursively. In the environment supports lazy evaluation,
we can expand all candidates, and take the first n on demand: take n (startsW ith s t),
where t is the prefix tree.

startsW ith ∅ (Nothing, ts) = enum ts

startsW ith ∅ (Just x, ts) = (∅, x) : enum ts
startsW
{ ith s (v, ts) = f ind ((k, t) 7→ s v k or k v s) ts = (6.16)
Nothing : ∅
+ a, b)|(a, b) ∈ startsW ith (s − k) t]
Just (k, t) : [(k +

Given a prefix s, function startsW ith searches all candidates in the prefix tree starts
with s. If s is empty, it enumerates all sub-trees, and prepand (∅, x) for none empty
value x in the root. Function enum ts is defined as:

enum = concatMap (k, t) 7→ [(k +

+ a, b)|(a, b) ∈ startsW ith ∅ t] (6.17)

Where concatMap (also known as flatMap) is an important concept for list compu-
tation. Literally, it results like firstly map on each element, then concatenate the result
together. It’s typically realized with ’build-foldr’ fusion law to eliminate the intermediate
list overhead. (see chapter 5 in my book Isomorphism – mathematics of programming)
If the input prefix s is not empty, we examine the sub-tree mappings, for each list and
sub-tree pair (k, t), if either s is prefix of k or vice versa, we recursively expand t and
prepand k to each result key; otherwise, s does not match any sub-trees, hence the result
is empty. Below example program implements this algorithm.
startsWith [] (PrefixTree Nothing ts) = enum ts
startsWith [] (PrefixTree (Just v) ts) = ([], v) : enum ts
startsWith k (PrefixTree _ ts) =
case find (λ(s, t) → s `isPrefixOf` k | | k `isPrefixOf` s) ts of
Nothing → []
Just (s, t) → [(s + + a, b) |
(a, b) ← startsWith (drop (length s) k) t]

enum = concatMap (λ(k, t) → [(k +

+ a, b) | (a, b) ← startsWith [] t])

We can also realize the algorithm Starts-With(T, k, n) imperatively. From the root,
we loop on every sub-tree mapping ki 7→ Ti . If k is the prefix for any sub-tree Ti , we
expand all things in it up to n items; if ki is the prefix of k, we then drop that prefix,
update the key to k − ki , then search Ti for this new key.
1: function Starts-With(T, k, n)
2: if T = NIL then
3: return NIL
4: s ← NIL
6.5. APPLICATIONS OF TRIE AND PREFIX TREE 119

5: repeat
6: match ← FALSE
7: for ki 7→ Ti in Sub-Trees(T ) do
8: if k is prefix of ki then
9: return Expand(s + + ki , Ti , n)
10: if ki is prefix of k then
11: match ← TRUE
12: k ← k − ki ▷ drop the prefix
13: T ← Ti
14: s←s+ + ki
15: break
16: until not match
17: return NIL
Where function Expand(s, T, n) populates n results from T and prepand s to each
key. We implement it with ‘breadth first search’ method (see section 14.3):
1: function Expand(s, T, n)
2: R ← NIL
3: Q ← [(s, T )]
4: while |R| < n and Q 6= NIL do
5: (k, T ) ← Pop(Q)
6: v ← Value(T )
7: if v 6= NIL then
8: Insert(R, (k, v))
9: for ki 7→ Ti in Sub-Trees(T ) do
10: Push(Q, (k + + ki , Ti ))

6.5.2 Predictive text input

Before 2010, most mobile phones had a small keypad as shown in 6.13, called ITU-T
keypad. It maps a digit to 3 - 4 characters. For example, when input word ‘home’, one
can press keys in below sequence:

Figure 6.13: The mobile phone ITU-T keypad.

1. Press key ‘4’ twice to enter ‘h’;

2. Press key ‘6’ three times to enter ‘o’;

3. Press key ‘6’ to enter ‘m’;

4. Press key ‘3’ twice to enter ‘e’;

120 CHAPTER 6. RADIX TREE

A smarter input method allows to press less keys:

1. Press key sequence ‘4’, ‘6’, ‘6’, ‘3’, the word ‘home’ appears as a candidate;

2. Press key ‘*’ to change to next candidate, word ‘good’ appears;

3. Press key ’*’ again for another candidate, word ‘gone’ appears;

4. ...

This is called predictive input, or abbreviated as ‘T9’[25], [26]. We can realize it

by storing the word dictionary in a prefix tree. The commercial implementations uses
multiple layers of caches/index in both memory and file system. We simplify it as an
example of prefix tree application. First, we need define the digit key mappings:

MT 9 = { 2 7→ "abc", 3 7→ "def", 4 7→ "ghi",

5 7→ "jkl", 6 7→ "mno", 7 7→ "pqrs", (6.18)
8 7→ "tuv", 9 7→ "wxyz" }

MT 9 [i] gives the corresponding characters for digit i. We can also define the reversed
mapping from a character back to digit.

MT−1
9 = concatMap ((d, s) 7→ [(c, d)|c ∈ s]) MT 9 (6.19)

Given a string, we can convert it to a sequence of digits by looking up MT−1

digits(s) = [MT−1
9 [c]|c ∈ s] (6.20)

For any character does not belong [a..z], we map it to a special key '#' as fallback.
Below example program defines the above two mappings.
mapT9 = [Link] [('2', ” abc ”), ('3', ” d e f ”), ('4', ” ghi ”),
('5', ” j k l ”), ('6', ”mno”), ('7', ” pqrs ”),
('8', ” tuv ”), ('9', ”wxyz”)]

rmapT9 = [Link] $ concatMap (λ(d, s) → [(c, d) | c ← s]) $

[Link] mapT9

digits = map (λc → [Link] '#' c rmapT9)

Suppose we already build the prefix tree (v, ts) from all words in a dictionary. We
need change the above auto completion algorithm to process digit string ds. For every
sub-tree mappings (s 7→ t) ∈ ts, we convert the prefix s to digits(s), check if it matches
to ds (either one is the prefix of the other). There can be multiple sub-trees match ds as:

pfx = [(s, t)|(s 7→ t) ∈ ts, digits(s) v ds or ds v digits(s)]

f indT 9 t ∅ = [∅]
(6.21)
f indT 9 (v, ts) ds = concatMap f ind pfx

For each mapping (s, t) in pfx, function f ind recursively lookup the remaining digits
ds′ in t, where ds′ = drop |s| ds, then prepend s to every candidate. However, the length
may exceeds the number of digits, we need cut and only take n = |ds| characters:

+ si )|si ∈ f indT 9 t ds′ ]

f ind (s, t) = [take n (s + (6.22)

The following example program implements the predictive input look up algorithm:
6.6. SUMMARY 121

findT9 _ [] = [[]]
findT9 (PrefixTree _ ts) k = concatMap find pfx where
find (s, t) = map (take (length k) ◦ (s++)) $ findT9 t (drop (length s) k)
pfx = [(s, t) | (s, t) ← ts, let ds = digits s in
ds `isPrefixOf` k | | k `isPrefixOf` ds]

To realize the predictive text input imperatively, we can perform breadth first search
with a queue Q of tuples (prefix, D, t). Every tuple records the possible prefix searched so
far; the remaining digits D to be searched; and the sub-tree t we are going to search. Q is
initialized with the empty prefix, the whole digits sequence, and the root. We repeatedly
pop the tuple from the queue, and examine the sub-tree mappings. for every mapping
(s 7→ T ′ ), we convert s to digits(s). If D is prefix of it, then we find a candidate. We
append s to prefix, and record it in the result. If digits(s) is prefix of D, we need further
search the sub-tree T ′ . We create a new tuple of (prefix + + s, D′ , T ′ ), where D′ is the
remaining digits to be searched. Then push this new tuple back to the queue.
1: function Look-Up-T9(T, D)
2: R ← NIL
3: if T = NIL or D = NIL then
4: return R
5: n ← |D|
6: Q ← {(NIL, D, T )}
7: while Q 6= NIL do
8: (prefix, D, T ) ← Pop(Q)
9: for (s 7→ T ′ ) ∈ Sub-Trees(T ) do
10: D′ ← Digits(s)
11: if D′ ⊏ D then ▷ D′ is prefix of D
12: Append(R, (prefix ++ s)[1..n]) ▷ limit the length to n
13: else if D ⊏ D′ then
14: + s, D − D′ , T ′ ))
Push(Q, (prefix +
15: return R

Exercise 6.5

1. Implement the auto-completion and predictive text input with trie.

2. How to ensure the candidates in lexicographic order in the auto-completion and
predictive text input program? What’s the performance change accordingly?
3. In the environment without lazy evaluation support, how to return the first n
candidates on-demand?

6.6 Summary
We start from integer trie and prefix tree. By turning the integer key to binary format,
we re-used binary tree to realize the integer based map data structure. Then extend the
key from integer to generic list, and limit the list element to finite set. Particularly for
alphabetic strings, the generic trie and prefix tree can be used as tools to manipulate the
text information. We give example applications about auto-completion and predictive
text input. as another instance of radix tree, the suﬀix tree is closely related to trie and
prefix tree used in text, and DNA processing.
122 CHAPTER 6. RADIX TREE

6.7 Appendix: Example programs

Definition of integer binary trie:
data IntTrie<T> {
IntTrie<T> left = null
IntTrie<T> right = null
Optional<T> value = [Link]
}

The following example insert program uses bit-wise operation to test even/odd, and
shift the bit to right:
IntTrie<T> insert(IntTrie<T> t, Int key,
Optional<T> value = [Link]) {
if t == null then t = IntTrie<T>()
p = t
while key ̸= 0 {
if key & 1 == 0 {
p = if [Link] == null then IntTrie<T>() else [Link]
} else {
p = if [Link] == null then IntTrie<T>() else [Link]
}
key = key >> 1
}
[Link] = [Link](value)
return t
}

Definition of integer prefix tree:

data IntTree<T> {
Int key
T value
Int prefix
Int mask = 1
IntTree<T> left = null
IntTree<T> right = null

IntTree(Int k, T v) {
key = k, value = v, prefix = k
}

bool isLeaf = (left == null and right == null)

Self replace(IntTree<T> x, IntTree<T> y) {

if left == x then left = y else right = y
}

bool match(Int k) = maskbit(k, mask) == prefix

}

Int maskbit(Int x, Int mask) = x & (~(mask - 1))

Insert key-value to integer prefix tree.

IntTree<T> insert(IntTree<T> t, Int key, T value) {
if t == null then return IntTree(key, value)
node = t
Node<T> parent = null
while (not [Link]()) and [Link](key) {
parent = node
node = if zero(key, [Link]) then [Link] else [Link]
}
if [Link]() and key == [Link] {
6.7. APPENDIX: EXAMPLE PROGRAMS 123

[Link] = value
} else {
p = branch(node, IntTree(key, value))
if parent == null then return p
[Link](node, p)
}
return t
}

IntTree<T> branch(IntTree<T> t1, IntTree<T> t2) {

var t = IntTree<T>()
([Link], [Link]) = lcp([Link], [Link])
([Link], [Link]) = if zero([Link], [Link]) then (t1, t2)
else (t2, t1)
return t
}

bool zero(int x, int mask) = (x & (mask >> 1) == 0)

Int lcp(Int p1, Int p2) {

Int diff = p1 ^ p2
Int mask = 1
while diff ̸= 0 {
diff = diff >> 1
mask = mask << 1
}
return (maskbit(p1, mask), mask)
}

Definition of trie and the insert program:

data Trie<K, V> {
Optional<V> value = [Link]
Map<K, Trie<K, V>> subTrees = [Link]()
}

Trie<K, V> insert(Trie<K, V> t, [K] key, V value) {

if t == null then t = Trie<K, V>()
var p = t
for c in key {
if [Link][c] == null then [Link][c] = Trie<K, V>()
p = [Link][c]
}
[Link] = [Link](value)
return t
}

Definition of Prefix Tree and insert program:

data PrefixTree<K, V> {
Optional<V> value = [Link]
Map<[K], PrefixTree<K, V>> subTrees = [Link]()

Self PrefixTree(V v) {
value = [Link](v)
}
}

PrefixTree<K, V> insert(PrefixTree<K, V> t, [K] key, V value) {

if t == null then t = PrefixTree()
var node = t
loop {
bool match = false
for var (k, tr) in [Link] {
if key == k {
[Link] = value
124 CHAPTER 6. RADIX TREE

return t
}
prefix, k1, k2 = lcp(key, k)
if prefix ̸= [] {
match = true
if k2 == [] {
node = tr
key = k1
break
} else {
[Link][prefix] = branch(k1, PrefixTree(value),
k2, tr)
[Link](k)
return t
}
}
}
if !match {
[Link][key] = PrefixTree(value)
break
}
}
return t
}

The longest common prefix lcp and branch example programs.

([K], [K], [K]) lcp([K] s1, [K] s2) {
j = 0
while j < length(s1) and j < length(s2) and s1[j] == s2[j] {
j = j + 1
}
return (s1[0..j-1], s1[j..], s2[j..])
}

PrefixTree<K, V> branch([K] key1, PrefixTree<K, V> tree1,

[K] key2, PrefixTree<K, V> tree2) {
if key1 == []:
[Link][key2] = tree2
return tree1
t = PrefixTree()
[Link][key1] = tree1
[Link][key2] = tree2
return t
}

Populate multiple candidates, they share the common prefix

[([K], V)] startsWith(PrefixTree<K, V> t, [K] key, Int n) {
if t == null then return []
[T] s = []
repeat {
bool match = false
for var (k, tr) in [Link] {
if [Link](k) {
return expand(s ++ k, tr, n)
} else if [Link](key) {
match = true
key = key[length(k)..]
t = tr
s = s ++ k
break
}
}
} until not match
return []
Elementary Algorithms 125

[([K], V)] expand([K] s, PrefixTree<K, V> t, Int n) {

[([K], V)] r = []
var q = Queue([(s, t)])
while length(r) < n and ![Link]() {
var (s, t) = [Link]()
v = [Link]
if [Link]() then [Link]((s, [Link]()))
for k, tr in [Link] {
[Link]((s ++ k, tr))
}
}
return r
}

Predictive text input lookup

var T9MAP={'2':"abc", '3':"def", '4':"ghi", '5':"jkl", λ
'6':"mno", '7':"pqrs", '8':"tuv", '9':"wxyz"}

var T9RMAP = { c : d for var (d, cs) in T9MAP for var c in cs }

string digits(string w) = ''.join([T9RMAP[c] for c in w])

[string] lookupT9(PrefixTree<char, V> t, string key) {

if t == null or key == "" then return []
res = []
n = length(key)
q = Queue(("", key, t))
while not [Link]() {
(prefix, key, t) = [Link]()
for var (k, tr) in [Link] {
ds = digits(k)
if [Link](ds) {
[Link]((prefix ++ k)[:n])
} else if [Link](key) {
[Link]((prefix ++ k, key[length(k)..], tr))
}
}
}
return res
}
126 B-Trees
Chapter 7

B-Trees

7.1 Introduction
The B-Tree is an important data structure. It is widely used in modern file systems.
Some are implemented based on B+ trees, which is an extension of a B-tree. B-trees are
also widely used in database systems.
Some textbooks introduce B-trees with the the problem of accessing a large block of
data on a magnetic disk or a secondary storage device[4]. It is also helpful to understand
B-trees as a generalization of balanced binary search trees[39].
When examining Figure 7.1, it is easy to find the differences and similarities between
B-trees and binary search trees.

C G P T W

A B D E F H I J K N O Q R S U V X Y Z

Figure 7.1: Example B-Tree

Let’s remind ourselves of the definition of binary search tree. A binary search tree is
• either an empty node;
• or a node which contains 3 parts, a value, a left child and a right child. Both
children are also binary search trees.
The binary search tree satisfies the constraint that.
• all the values on the left child are not greater than the value of of this node;
• the value of this node is not greater than any values on the right child.
For a non-empty binary tree (L, k, R), where L, R and k are the left, right children, and
the key. Function Key(T ) accesses the key of tree T . The constraint can be represented
as the following.
∀x ∈ L, ∀y ∈ R ⇒ Key(x) ≤ k ≤ Key(y) (7.1)

127
128 CHAPTER 7. B-TREES

If we extend this definition to allow multiple keys and children, we get the B-tree
definition.
A B-tree
• is either empty;
• or contains n keys, and n + 1 children, each child is also a B-Tree, we denote these
keys and children as k1 , k2 , ..., kn and c1 , c2 , ..., cn , cn+1 .
Figure 7.2 illustrates a B-Tree node.

C[1] K[1] C[2] K[2] ... C[n] K[n] C[n+1]

Figure 7.2: A B-Tree node

The keys and children in a node satisfy the following order constraints.
• Keys are stored in non-decreasing order. that k1 ≤ k2 ≤ ... ≤ kn ;
• for each ki , all elements stored in child ci are not greater than ki , while ki is not
greater than any values stored in child ci+1 .
The constraints can be represented as in equation (7.2) as well.
∀xi ∈ ci , i = 0, 1, ..., n, ⇒ x1 ≤ k1 ≤ x2 ≤ k2 ≤ ... ≤ xn ≤ kn ≤ xn+1 (7.2)
Finally, after adding some constraints to make the tree balanced, we get the complete
B-tree definition.
• All leaves have the same depth;
• We define the integral number t as the minimum degree of the B-tree;
– each node can have at most 2t − 1 keys;
– each node can have at least t − 1 keys, except the root;
Consider a B-tree which holds n keys. The minimum degree t ≥ 2. The height is h.
All the nodes have at least t − 1 keys except the root. The root contains at least 1 key.
There are at least 2 nodes at depth 1, at least 2t nodes at depth 2, at least 2t2 nodes at
depth 3, ..., finally, there are at least 2th−1 nodes at depth h. Times all nodes with t − 1
except for root, the total number of keys satisfies the following inequality.
n ≥ 1 + (t − 1)(2 + 2t + 2t2 + ... + 2th−1 )
∑
h−1
= 1 + 2(t − 1) tk
k=0 (7.3)
th − 1
= 1 + 2(t − 1)
t−1
= 2th − 1
Thus we have the inequality between the height and the number of keys.
n+1
h ≤ logt (7.4)
2
This is the reason why a B-tree is balanced. The simplest B-tree is a so called 2-3-4
tree, where t = 2, in that every node except the root contains 2 or 3 or 4 keys. Essentially,
a red-black tree can be mapped to 2-3-4 tree.
The following Python code shows an example B-tree definition. It explicitly accepts t
as a parameter when creating a node.
7.2. INSERTION 129

class BTree:
def __init__(self, t):
self.t = t
[Link] = []
[Link] = []

B-tree nodes commonly have satellite data as well. We ignore satellite data for illus-
tration purpose.
In this chapter, we will first introduce how to generate a B-tree by insertion. Two
different methods will be explained. One is the classic method as in [4], that we split
the node before insertion if it’s full; the other is the modify-fix approach which is quite
similar to the red-black tree solution [3] [39]. We will next explain how to delete keys
from B-trees and how to look up a key.

7.2 Insertion
B-tree can be created by inserting keys repeatedly. The basic idea is similar to the binary
search tree. When insert key x, from the tree root, we examine all the keys in the node
to find a position where all the keys on the left are less than x, while all the keys on
the right are greater than x.1 If the current node is a leaf node, and it is not full (there
are less then 2t − 1 keys in this node), x will be insert at this position. Otherwise, the
position points to a child node. We need recursively insert x to it.

4 11 26 38 45

1 2 5 8 9 12 15 16 17 21 25 30 31 37 40 42 46 47 50

(a) Insert key 22 to the 2-3-4 tree. 22 > 20, go to the right child; 22 < 26
go to the first child.
20

4 11 26 38 45

1 2 5 8 9 12 15 16 17 21 22 25 30 31 37 40 42 46 47 50

(b) 21 < 22 < 25, and the leaf isn’t full.

Figure 7.3: Insertion is similar to binary search tree.

Figure 7.3 shows one example. The B-tree illustrated is 2-3-4 tree. When insert key
x = 22, because it’s greater than the root, the right child contains key 26, 38, 45 is
examined next; Since 22 < 26, the first child contains key 21 and 25 are examined. This
is a leaf node, and it is not full, key 22 is inserted to this node.
However, if there are 2t − 1 keys in the leaf, the new key x can’t be inserted, because
this node is ’full’. When try to insert key 18 to the above example B-tree will meet this
problem. There are 2 methods to solve it.
1 This is a strong constraint. In fact, only less-than and equality testing is necessary. The later exercise

address this point.

130 CHAPTER 7. B-TREES

7.2.1 Splitting
Split before insertion
If the node is full, one method to solve the problem is to split to node before insertion.
For a node with t − 1 keys, it can be divided into 3 parts as shown in Figure 7.4. the
left part contains the first t − 1 keys and t children. The right part contains the rest t − 1
keys and t children. Both left part and right part are valid B-tree nodes. the middle part
is the t-th key. We can push it up to the parent node (if the current node is root, then
the this key, with the two children will be the new root).

K[1] K[2] ... K[t] ... K[2t-1]

C[1] C[2] ... C[t] C[t+1] ... C[2t-1] C[2t]

(a) Before split

... K[t] ...

K[1] K[2] ... K[t-1] K[t+1] ... K[2t-1]

C[1] C[2] ... C[t] C[t+1] ... C[2t-1]

(b) After split

Figure 7.4: Split node

For node x, denote K(x) as keys, C(x) as children. The i-th key as ki (x), the j-th
child as cj (x). Below algorithm describes how to split the i-th child for a given node.
1: procedure Split-Child(node, i)
2: x ← ci (node)
3: y ← CREATE-NODE
4: Insert(K(node), i, kt (x))
5: Insert(C(node), i + 1, y)
6: K(y) ← {kt+1 (x), kt+2 (x), ..., k2t−1 (x)}
7: K(x) ← {k1 (x), k2 (x), ..., kt−1 (x)}
8: if y is not leaf then
9: C(y) ← {ct+1 (x), ct+2 (x), ..., c2t (x)}
10: C(x) ← {c1 (x), c2 (x), ..., ct (x)}
The following example Python program implements this child splitting algorithm.
def split_child(node, i):
t = node.t
x = [Link][i]
y = BTree(t)
[Link](i, [Link][t-1])
[Link](i+1, y)
[Link] = [Link][t:]
[Link] = [Link][:t-1]
if not is_leaf(x):
[Link] = [Link][t:]
[Link] = [Link][:t]
7.2. INSERTION 131

Where function is_leaf test if a node is leaf.

def is_leaf(t):
return [Link] == []

After splitting, a key is pushed up to its parent node. It is quite possible that the
parent node has already been full. And this pushing violates the B-tree property.
In order to solve this problem, we can check from the root along the path of insertion
traversing till the leaf. If there is any node in this path is full, the splitting is applied.
Since the parent of this node has been examined, it is ensured that there are less than
2t − 1 keys in the parent. It won’t make the parent full if pushing up one key. This
approach only need one single pass down the tree without any back-tracking.
If the root need splitting, a new node is created as the new root. There is no keys in
this new created root, and the previous root is set as the only child. After that, splitting
is performed top-down. And we can insert the new key finally.
1: function Insert(T, k)
2: r←T
3: if r is full then ▷ root is full
4: s ← CREATE-NODE
5: C(s) ← {r}
6: Split-Child(s, 1)
7: r←s
8: return Insert-Nonfull(r, k)
Where algorithm Insert-Nonfull assumes the node passed in is not full. If it is a
leaf node, the new key is inserted to the proper position based on the order; Otherwise,
the algorithm finds a proper child node to which the new key will be inserted. If this
child is full, splitting will be performed.
1: function Insert-Nonfull(T, k)
2: if T is leaf then
3: i←1
4: while i ≤ |K(T )| ∧ k > ki (T ) do
5: i←i+1
6: Insert(K(T ), i, k)
7: else
8: i ← |K(T )|
9: while i > 1 ∧ k < ki (T ) do
10: i←i−1
11: if ci (T ) is full then
12: Split-Child(T, i)
13: if k > ki (T ) then
14: i←i+1
15: Insert-Nonfull(ci (T ), k)
16: return T
This algorithm is recursive. In B-tree, the minimum degree t is typically relative to
magnetic disk structure. Even small depth can support huge amount of data (with t = 10,
maximum to 10 billion data can be stored in a B-tree with height of 10). The recursion
can also be eliminated. This is left as exercise to the reader.
Figure 7.5 shows the result of continuously inserting keys G, M, P, X, A, C, D, E, J,
K, N, O, R, S, T, U, V, Y, Z to the empty tree. The first result is the 2-3-4 tree (t = 2).
The second result shows how it varies when t = 3.
Below example Python program implements this algorithm.
132 CHAPTER 7. B-TREES

E P

C M S U X

A D G J K N O R T V Y Z

(a) 2-3-4 tree.

D M P T

A C E G J K N O R S U V X Y Z

(b) t = 3

Figure 7.5: Insertion result

def insert(tr, key):

root = tr
if is_full(root):
s = BTree(root.t)
[Link](0, root)
split_child(s, 0)
root = s
return insert_nonfull(root, key)

And the insertion to non-full node is implemented as the following.

def insert_nonfull(tr, key):
if is_leaf(tr):
ordered_insert([Link], key)
else:
i = len([Link])
while i>0 and key < [Link][i-1]:
i = i-1
if is_full([Link][i]):
split_child(tr, i)
if key>[Link][i]:
i = i+1
insert_nonfull([Link][i], key)
return tr

Where function ordered_insert is used to insert an element to an ordered list.

Function is_full tests if a node contains 2t − 1 keys.
def ordered_insert(lst, x):
i = len(lst)
[Link](x)
while i>0 and lst[i]<lst[i-1]:
(lst[i-1], lst[i]) = (lst[i], lst[i-1])
i=i-1

def is_full(node):
return len([Link]) ≥ 2 ∗ node.t - 1

For the array based collection, append on the tail is much more effective than insert in
other position, because the later takes O(n) time, if the length of the collection is n. The
ordered_insert program firstly appends the new element at the end of the existing
7.2. INSERTION 133

collection, then iterates from the last element to the first one, and checks if the current
two elements next to each other are ordered. If not, these two elements will be swapped.

Insert then fixing

In functional settings, B-tree insertion can be realized in a way similar to red-black tree.
When insert a key to red-black tree, it is firstly inserted as in the normal binary search
tree, then recursive fixing is performed to resume the balance of the tree. B-tree can
be viewed as extension to the binary search tree, that each node contains multiple keys
and children. We can firstly insert the key without considering if the node is full. Then
perform fixing to satisfy the minimum degree constraint.

insert(T, k) = f ix(ins(T, k)) (7.5)

Function ins(T, k) traverse the B-tree T from root to find a proper position where key
k can be inserted. After that, function f ix is applied to resume the B-tree properties.
Denote B-tree in a form of T = (K, C, t), where K represents keys, C represents children,
and t is the minimum degree.
Below is the Haskell definition of B-tree.
data BTree a = Node{ keys :: [a]
, children :: [BTree a]
, degree :: Int} deriving (Eq)

The insertion function can be provided based on this definition.

insert tr x = fixRoot $ ins tr x

There are two cases when realize ins(T, k) function. If the tree T is leaf, k is inserted
to the keys; Otherwise if T is the branch node, we need recursively insert k to the proper
child.
Figure 7.6 shows the branch case. The algorithm first locates the position. for certain
key ki , if the new key k to be inserted satisfy ki−1 < k < ki , Then we need recursively
insert k to child ci .
This position divides the node into 3 parts, the left part, the child ci and the right
part.

k, K[i-1]<k<K[i]

insert to

K[1] K[2] ... K[i-1] K[i] ... K[n]

C[1] C[2] ... C[i-1] C[i] C[i+1] ... C[n] C[n+1]

(a) Locate the child to insert.

K[1] K[2] ... K[i-1] k, K[i-1]<k<K[i] K[i] K[i+1] ... K[n]

recursive insert

C[1] C[2] ... C[i-1] C[i] C[i+1] ... C[n+1]

(b) Recursive insert.

Figure 7.6: Insert a key to a branch node

134 CHAPTER 7. B-TREES
{
(K ′ ∪ {k} ∪ K ′′ , ϕ, t) :
C = ϕ, (K ′ , K ′′ ) = divide(K, k)
ins(T, k) =
make((K , C1 ), ins(c, k), (K ′′ , C2′ )) :
′
(C1 , C2 ) = split(|K ′ |, C)
(7.6)
The first clause deals with the leaf case. Function divide(K, k) divide keys into two
parts, all keys in the first part are not greater than k, and all rest keys are not less than
k.

K = K ′ ∪ K ′′ ∧ ∀k ′ ∈ K ′ , k ′′ ∈ K ′′ ⇒ k ′ ≤ k ≤ k ′′

The second clause handle the branch case. Function split(n, C) splits children in two
parts, C1 and C2 . C1 contains the first n children; and C2 contains the rest. Among C2 ,
the first child is denoted as c, and others are represented as C2′ .
Here the key k need be recursively inserted into child c. Function make takes 3
parameter. The first and the third are pairs of key and children; the second parameter
is a child node. It examines if a B-tree node made from these keys and children violates
the minimum degree constraint and performs fixing if necessary.
{
f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) : f ull(c)
make((K ′ , C ′ ), c, (K ′′ , C ′′ )) = (7.7)
(K ′ ∪ K ′′ , C ′ ∪ {c} ∪ C ′′ , t) : otherwise

Where function f ull(c) tests if the child c is full. Function f ixF ull splits the the child
c, and forms a new B-tree node with the pushed up key.

f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) = (K ′ ∪ {k ′ } ∪ K ′′ , C ′ ∪ {c1 , c2 } ∪ C ′′ , t) (7.8)

Where (c1 , k ′ , c2 ) = split(c). During splitting, the first t − 1 keys and t children are
extract to one new child, the last t − 1 keys and t children form another child. The t-th
key k ′ is pushed up.
With all the above functions defined, we can realize f ix(T ) to complete the functional
B-tree insertion algorithm. It firstly checks if the root contains too many keys. If it
exceeds the limit, splitting will be applied. The split result will be used to make a new
node, so the total height of the tree increases by one.

 c : T = (ϕ, {c}, t)
f ix(T ) = ({k ′ }, {c1 , c2 }, t) : f ull(T ), (c1 , k ′ , c2 ) = split(T ) (7.9)

T : otherwise

The following Haskell example code implements the B-tree insertion.

import qualified [Link] as L

ins (Node ks [] t) x = Node ([Link] x ks) [] t

ins (Node ks cs t) x = make (ks', cs') (ins c x) (ks'', cs'')
where
(ks', ks'') = [Link] (<x) ks
(cs', (c:cs'')) = [Link] (length ks') cs

fixRoot (Node [] [tr] _) = tr −− shrink height

fixRoot tr = if full tr then Node [k] [c1, c2] (degree tr)
else tr
where
(c1, k, c2) = split tr

make (ks', cs') c (ks'', cs'')

| full c = fixFull (ks', cs') c (ks'', cs'')
| otherwise = Node (ks'++ks'') (cs'++[c]++cs'') (degree c)

fixFull (ks', cs') c (ks'', cs'') = Node (ks'++[k]++ks'')

(cs'++[c1,c2]++cs'') (degree c)
7.3. DELETION 135

where
(c1, k, c2) = split c

full tr = (length $ keys tr) > 2∗(degree tr)-1

Figure 7.7 shows the varies of results of building B-trees by continuously inserting
keys ”GMPXACDEJKNORSTUVYZ”.

E O

C M R T V

A D G J K N P S U X Y Z

(a) Insert result of a 2-3-4 tree.

G M P T

A C D E J K N O R S U V X Y Z

(b) Insert result of a B-tree with t = 3

Figure 7.7: Insert then fixing results

Compare to the imperative insertion result as shown in figure 7.7 we can found that
there are different. However, they are all valid because all B-tree properties are satisfied.

7.3 Deletion
Deleting a key from B-tree may violate balance properties. Except the root, a node
shouldn’t contain too few keys less than t − 1, where t is the minimum degree.
Similar to the approaches for insertion, we can either do some preparation so that the
node from where the key being deleted contains enough keys; or do some fixing after the
deletion if the node has too few keys.

7.3.1 Merge before delete method

We start from the easiest case. If the key k to be deleted can be located in node x, and
x is a leaf node, we can directly remove k from x. If x is the root (the only node of the
tree), we needn’t worry about there are too few keys after deletion. This case is named
as case 1 later.
In most cases, we start from the root, along a path to locate where is the node contains
k. If k can be located in the internal node x, there are three sub cases.

• Case 2a, If the child y precedes k contains enough keys (more than t), we replace k
in node x with k ′ , which is the predecessor of k in child y. And recursively remove
k ′ from y.
The predecessor of k can be easily located as the last key of child y.
This is shown in figure 7.8.
136 CHAPTER 7. B-TREES

Figure 7.8: Replace and delete from predecessor.

• Case 2b, If y doesn’t contain enough keys, while the child z follows k contains more
than t keys. We replace k in node x with k ′′ , which is the successor of k in child z.
And recursively remove k ′′ from z.
The successor of k can be easily located as the first key of child z.
This sub-case is illustrated in figure 7.9.
• Case 2c, Otherwise, if neither y, nor z contains enough keys, we can merge y, k and
z into one new node, so that this new node contains 2t − 1 keys. After that, we can
then recursively do the removing.
Note that after merge, if the current node doesn’t contain any keys, which means
k is the only key in x. y and z are the only two children of x. we need shrink the
tree height by one.

Figure 7.10 illustrates this sub-case.

the last case states that, if k can’t be located in node x, the algorithm need find a
child node ci in x, so that the sub-tree ci contains k. Before the deletion is recursively
applied in ci , we need make sure that there are at least t keys in ci . If there are not
enough keys, the following adjustment is performed.

• Case 3a, We check the two sibling of ci , which are ci−1 and ci+1 . If either one
contains enough keys (at least t keys), we move one key from x down to ci , and
move one key from the sibling up to x. Also we need move the relative child from
the sibling to ci .
This operation makes ci contains enough keys for deletion. we can next try to delete
k from ci recursively.
Figure 7.11 illustrates this case.
• Case 3b, In case neither one of the two siblings contains enough keys, we then merge
ci , a key from x, and either one of the sibling into a new node. Then do the deletion
on this new node.
7.3. DELETION 137

Figure 7.9: Replace and delete from successor.

Figure 7.10: Merge and delete.

138 CHAPTER 7. B-TREES

Figure 7.11: Borrow from the right sibling.

Figure 7.12 shows this case.

Before define the B-tree delete algorithm, we need provide some auxiliary functions.
Function Can-Del tests if a node contains enough keys for deletion.
1: function Can-Del(T )
2: return |K(T )| ≥ t
Procedure Merge-Children(T, i) merges child ci (T ), key ki (T ), and child ci+1 (T )
into one big node.
1: procedure Merge-Children(T, i) ▷ Merge ci (T ), ki (T ), and ci+1 (T )
2: x ← ci (T )
3: y ← ci+1 (T )
4: K(x) ← K(x) ∪ {ki (T )} ∪ K(y)
5: C(x) ← C(x) ∪ C(y)
6: Remove-At(K(T ), i)
7: Remove-At(C(T ), i + 1)
Procedure Merge-Children merges the i-th child, the i-th key, and i + 1-th child of
node T into a new child, and remove the i-th key and i + 1-th child from T after merging.
With these functions defined, the B-tree deletion algorithm can be given by realizing
the above 3 cases.
1: function Delete(T, k)
2: i←1
3: while i ≤ |K(T )| do
4: if k = ki (T ) then
5: if T is leaf then ▷ case 1
6: Remove(K(T ), k)
7: else ▷ case 2
8: if Can-Del(ci (T )) then ▷ case 2a
7.3. DELETION 139

Figure 7.12: Merge ci , k, and ci+1 to a new node.

9: ki (T ) ← Last-Key(ci (T ))
10: Delete(ci (T ), ki (T ))
11: else if Can-Del(ci+1 (T )) then ▷ case 2b
12: ki (T ) ← First-Key(ci+1 (T ))
13: Delete(ci+1 (T ), ki (T ))
14: else ▷ case 2c
15: Merge-Children(T, i)
16: Delete(ci (T ), k)
17: if K(T ) = N IL then
18: T ← ci (T ) ▷ Shrinks height
19: return T
20: else if k < ki (T ) then
21: Break
22: else
23: i←i+1

24: if T is leaf then

25: return T ▷ k doesn’t exist in T .
26: if ¬ Can-Del(ci (T )) then ▷ case 3
27: if i > 1∧ Can-Del(ci−1 (T )) then ▷ case 3a: left sibling
28: Insert(K(ci (T )), ki−1 (T ))
29: ki−1 (T ) ← Pop-Back(K(ci−1 (T )))
30: if ci (T ) isn’t leaf then
31: c ← Pop-Back(C(ci−1 (T )))
32: Insert(C(ci (T )), c)
33: else if i ≤ |C(T )|∧ Can-Del(ci1 (T )) then ▷ case 3a: right sibling
34: Append(K(ci (T )), ki (T ))
140 CHAPTER 7. B-TREES

35: ki (T ) ← Pop-Front(K(ci+1 (T )))

36: if ci (T ) isn’t leaf then
37: c ← Pop-Front(C(ci+1 (T )))
38: Append(C(ci (T )), c)
39: else ▷ case 3b
40: if i > 1 then
41: Merge-Children(T, i − 1)
42: else
43: Merge-Children(T, i)
44: Delete(ci (T ), k) ▷ recursive delete
45: if K(T ) = N IL then ▷ Shrinks height
46: T ← c1 (T )
47: return T
Figure 7.13, 7.14, and 7.15 show the deleting process step by step. The nodes modified
are shaded.

C G M T X

A B D E F J K L N O Q R S U V Y Z

(a) A B-tree before deleting.

C G M T X

A B D E J K L N O Q R S U V Y Z

(b) After delete key ’F’, case 1.

Figure 7.13: Result of B-tree deleting (1).

The following example Python program implements the B-tree deletion algorithm.
def can_remove(tr):
return len([Link]) ≥ tr.t

def replace_key(tr, i, k):

[Link][i] = k
return k

def merge_children(tr, i):

[Link][i].keys += [[Link][i]] + [Link][i+1].keys
[Link][i].children += [Link][i+1].children
[Link](i)
[Link](i+1)

def B_tree_delete(tr, key):

i = len([Link])
7.3. DELETION 141

C G L T X

A B D E J K N O Q R S U V Y Z

(a) After delete key ’M’, case 2a.

C L T X

A B D E J K N O Q R S U V Y Z

(b) After delete key ’G’, case 2c.

Figure 7.14: Result of B-tree deleting program (2)

C L P T X

A B E J K N O Q R S U V Y Z

(a) After delete key ’D’, case 3b, and height is shrunk.
E L P T X

A C J K N O Q R S U V Y Z

(b) After delete key ’B’, case 3a, borrow from right sib-
ling.
E L P S X

A C J K N O Q R T V Y Z

Figure 7.15: Result of B-tree deleting program (3)

142 CHAPTER 7. B-TREES

while i>0:
if key == [Link][i-1]:
if [Link]: # case 1 in CLRS
[Link](key)
else: # case 2 in CLRS
if [Link][i-1].can_remove(): # case 2a
key = tr.replace_key(i-1, [Link][i-1].keys[-1])
B_tree_delete([Link][i-1], key)
elif [Link][i].can_remove(): # case 2b
key = tr.replace_key(i-1, [Link][i].keys[0])
B_tree_delete([Link][i], key)
else: # case 2c
tr.merge_children(i-1)
B_tree_delete([Link][i-1], key)
if [Link]==[]: # tree shrinks in height
tr = [Link][i-1]
return tr
elif key > [Link][i-1]:
break
else:
i = i-1
# case 3
if [Link]:
return tr #key doesn’t exist at all
if not [Link][i].can_remove():
if i>0 and [Link][i-1].can_remove(): #left sibling
[Link][i].[Link](0, [Link][i-1])
[Link][i-1] = [Link][i-1].[Link]()
if not [Link][i].leaf:
[Link][i].[Link](0, [Link][i-1].[Link]())
elif i<len([Link]) and [Link][i+1].can_remove(): #right sibling
[Link][i].[Link]([Link][i])
[Link][i]=[Link][i+1].[Link](0)
if not [Link][i].leaf:
[Link][i].[Link]([Link][i+1].[Link](0))
else: # case 3b
if i>0:
tr.merge_children(i-1)
else:
tr.merge_children(i)
B_tree_delete([Link][i], key)
if [Link]==[]: # tree shrinks in height
tr = [Link][0]
return tr

7.3.2 Delete and fix method

The merge and delete algorithm is a bit complex. There are several cases, and in each
case, there are sub cases to deal.
Another approach to design the deleting algorithm is to perform fixing after deletion.
It is similar to the insert-then-fix strategy.

delete(T, k) = f ix(del(T, k)) (7.10)

When delete a key from B-tree, we firstly locate which node this key is contained. We
traverse from the root to the leaves till find this key in some node.
If this node is a leaf, we can remove the key, and then examine if the deletion makes
the node contains too few keys to satisfy the B-tree balance properties.
If it is a branch node, removing the key breaks the node into two parts. We need
merge them together. The merging is a recursive process which is shown in figure 7.16.
7.3. DELETION 143

Figure 7.16: Delete a key from a branch node. Removing ki breaks the node into 2 parts.
Merging these 2 parts is a recursive process. When the two parts are leaves, the merging
terminates.

When do merging, if the two nodes are not leaves, we merge the keys together, and
recursively merge the last child of the left part and the first child of the right part to one
new node. Otherwise, if they are leaves, we merely put all keys together.
Till now, the deleting is performed in straightforward way. However, deleting decreases
the number of keys of a node, and it may result in violating the B-tree balance properties.
The solution is to perform fixing along the path traversed from root.
During the recursive deletion, the branch node is broken into 3 parts. The left part
contains all keys less than k, includes k1 , k2 , ..., ki−1 , and children c1 , c2 , ..., ci−1 , the right
part contains all keys greater than k, say ki , ki+1 , ..., kn+1 , and children ci+1 , ci+2 , ..., cn+1 .
Then key k is recursively deleted from child ci . Denote the result becomes c′i after that.
We need make a new node from these 3 parts, as shown in figure 7.17.
At this time point, we need examine if c′i contains enough keys. If there are too less
keys (less than t − 1, but not t in contrast to the merge-and-delete approach), we can
either borrow a key-child pair from the left or the right part, and do inverse operation of
splitting. Figure 7.18 shows example of borrowing from the left part.
If both left part and right part are empty, we can simply push c′i up.
Denote the B-tree as T = (K, C, t), where K and C are keys and children. The
del(T, k) function deletes key k from the tree.

 (delete(K, k), ϕ, t) : C=ϕ
del(T, k) = merge((K1 , C1 , t), (K2 , C2 , t)) : ki = k (7.11)

make((K1′ , C1′ ), del(c, k), (K2′ , C2′ )) : k∈/K

If children C = ϕ is empty, T is leaf. k is deleted from keys directly. Otherwise, T is

internal node. If k ∈ K, removing it separates the keys and children in two parts (K1 , C1 )
and (K2 , C2 ). They will be recursively merged.
144 CHAPTER 7. B-TREES

Figure 7.17: After delete key k from node ci , denote the result as c′i . The fixing makes a
new node from the left part, c′i and the right part.

Figure 7.18: Borrow a key-child pair from left part and un-split to a new child.
7.3. DELETION 145

K1 = {k1 , k2 , ..., ki−1 }

K2 = {ki+1 , ki+2 , ..., km }
C1 = {c1 , c2 , ..., ci }
C2 = {ci+1 , ci+2 , ..., cm+1 }

If k ∈
/ K, we need locate a child c, and further delete k from it.

(K1′ , K2′ ) = ({k ′ |k ′ ∈ K, k ′ < k}, {k ′ |k ′ ∈ K, k < k ′ })

(C1′ , {c} ∪ C2′ ) = splitAt(|K1′ |, C)

The recursive merge function is defined as the following. When merge two trees
T1 = (K1 , C1 , t) and T2 = (K2 , C2 , t), if both are leaves, we create a new leave by
concatenating the keys. Otherwise, the last child in C1 , and the first child in C2 are
recursively merged. And we call make function to form the new tree. When C1 and C2
are not empty, denote the last child of C1 as c1,m , the rest as C1′ ; the first child of C2 as
C2,1 , the rest as C2′ . Below equation defines the merge function.
{
(K1 ∪ K2 , ϕ, t) : C1 = C2 = ϕ
merge(T1 , T2 ) =
make((K1 , C1′ ), merge(c1,m , c2,1 ), (K2 , C2′ )) : otherwise
(7.12)
The make function defined above only handles the case that a node contains too many
keys due to insertion. When delete key, it may cause a node contains too few keys. We
need test and fix this situation as well.

 f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) : f ull(c)
′ ′ ′′ ′′
make((K , C ), c, (K , C )) = f ixLow((K ′ , C ′ ), c, (K ′′ , C ′′ )) : low(c) (7.13)

(K ′ ∪ K ′′ , C ′ ∪ {c} ∪ C ′′ , t) : otherwise

Where low(T ) checks if there are too few keys less than t−1. Function f ixLow(Pl , c, Pr )
takes three arguments, the left pair of keys and children, a child node, and the right pair
of keys and children. If the left part isn’t empty, we borrow a pair of key-child, and do
un-splitting to make the child contain enough keys, then recursively call make; If the
right part isn’t empty, we borrow a pair from the right; and if both sides are empty, we
return the child node as result. In this case, the height of the tree shrinks.
Denote the left part Pl = (Kl , Cl ). If Kl isn’t empty, the last key and child are
represented as kl,m and cl,m respectively. The rest keys and children become Kl′ and Cl′ ;
Similarly, the right part is denoted as Pr = (Kr , Cr ). If Kr isn’t empty, the first key and
child are represented as kr,1 , and cr,1 . The rest keys and children are Kr′ and Cr′ . Below
equation gives the definition of f ixLow.

 make((Kl′ , Cl′ ), unsplit(cl,m , kl,m , c), (Kr , Cr )) : Kl 6= ϕ
f ixLow(Pl , c, Pr ) = make((Kr , Cr ), unsplit(c, kr,1 , cr,1 ), (Kr′ , Cr′ )) : Kr 6= ϕ

c : otherwise
(7.14)
Function unsplit(T1 , k, T2 ) is the inverse operation to splitting. It forms a new B-tree
nodes from two small nodes and a key.

unsplit(T1 , k, T2 ) = (K1 ∪ {k} ∪ K2 , C1 ∪ C2 , t) (7.15)

The following example Haskell program implements the B-tree deletion algorithm.
import qualified [Link] as L

delete tr x = fixRoot $ del tr x

del:: (Ord a) ⇒ BTree a → a → BTree a

146 CHAPTER 7. B-TREES

del (Node ks [] t) x = Node ([Link] x ks) [] t

del (Node ks cs t) x =
case [Link] x ks of
Just i → merge (Node (take i ks) (take (i+1) cs) t)
(Node (drop (i+1) ks) (drop (i+1) cs) t)
Nothing → make (ks', cs') (del c x) (ks'', cs'')
where
(ks', ks'') = [Link] (<x) ks
(cs', (c:cs'')) = [Link] (length ks') cs

merge (Node ks [] t) (Node ks' [] _) = Node (ks++ks') [] t

merge (Node ks cs t) (Node ks' cs' _) = make (ks, init cs)
(merge (last cs) (head cs'))
(ks', tail cs')

make (ks', cs') c (ks'', cs'')

| full c = fixFull (ks', cs') c (ks'', cs'')
| low c = fixLow (ks', cs') c (ks'', cs'')
| otherwise = Node (ks'++ks'') (cs'++[c]++cs'') (degree c)

low tr = (length $ keys tr) < (degree tr)-1

fixLow (ks'@(_:_), cs') c (ks'', cs'') = make (init ks', init cs')
(unsplit (last cs') (last ks') c)
(ks'', cs'')
fixLow (ks', cs') c (ks''@(_:_), cs'') = make (ks', cs')
(unsplit c (head ks'') (head cs''))
(tail ks'', tail cs'')
fixLow _ c _ = c

unsplit c1 k c2 = Node ((keys c1)++[k]++(keys c2))

((children c1)++(children c2)) (degree c1)

When delete the same keys from the B-tree as in delete and fixing approach, the results
are different. However, both satisfy the B-tree properties, so they are all valid.

C G P T W

A B D E F H I J K N O Q R S U V X Y Z

(a) B-tree before deleting

C G P T W

A B D F H I J K N O Q R S U V X Y Z

(b) After delete key ’E’.

Figure 7.19: Result of delete-then-fixing (1)

7.3. DELETION 147

C H P T W

A B D F I J K N O Q R S U V X Y Z

(a) After delete key ’G’,

H M P T W

B C D F I J K N O Q R S U V X Y Z

(b) After delete key ’A’.

Figure 7.20: Result of delete-then-fixing (2)

H P T W

B C D F I J K N O Q R S U V X Y Z

(a) After delete key ’M’.

H P W

B C D F I J K N O Q R S T V X Y Z

(b) After delete key ’U’.

Figure 7.21: Result of delete-then-fixing (3)

148 CHAPTER 7. B-TREES

7.4 Searching
Searching in B-tree can be considered as the generalized tree search extended from binary
search tree.
When searching in the binary tree, there are only 2 different directions, the left and
the right. However, there are multiple directions in B-tree.
1: function Search(T, k)
2: loop
3: i←1
4: while i ≤ |K(T )| ∧ k > ki (T ) do
5: i←i+1
6: if i ≤ |K(T )| ∧ k = ki (T ) then
7: return (T, i)
8: if T is leaf then
9: return N IL ▷ k doesn’t exist
10: else
11: T ← ci (T )
Starts from the root, this program examines each key one by one from the smallest to
the biggest. In case it finds the matched key, it returns the current node and the index
of this key. Otherwise, if it finds the position i that ki < k < ki+1 , the program will next
search the child node ci+1 for the key. If it traverses to some leaf node, and fails to find
the key, the empty value is returned to indicate that this key doesn’t exist in the tree.
The following example Python program implements the search algorithm.
def B_tree_search(tr, key):
while True:
for i in range(len([Link])):
if key ≤ [Link][i]:
break
if key == [Link][i]:
return (tr, i)
if [Link]:
return None
else:
if key > [Link][-1]:
i=i+1
tr = [Link][i]

The search algorithm can also be realized by recursion. When search key k in B-tree
T = (K, C, t), we partition the keys with k.

K1 = {k ′ |k ′ < k}
K2 = {k ′ |k ≤ k ′ }

Thus K1 contains all the keys less than k, and K2 holds the rest. If the first element
in K2 is equal to k, we find the key. Otherwise, we recursively search the key in child
c|K1 |+1 .

 (T, |K1 | + 1) : k ∈ K2
search(T, k) = ϕ : C=ϕ (7.16)

search(c|K1 |+1 , k) : otherwise

Below example Haskell program implements this algorithm.

search :: (Ord a)⇒ BTree a → a → Maybe (BTree a, Int)
search tr@(Node ks cs _) k
7.5. NOTES AND SHORT SUMMARY 149

| matchFirst k $ drop len ks = Just (tr, len)

| otherwise = if null cs then Nothing
else search (cs !! len) k
where
matchFirst x (y:_) = x==y
matchFirst x _ = False
len = length $ filter (<k) ks

7.5 Notes and short summary

In this chapter, we explained the B-tree data structure as a kind of extension from binary
search tree. The background knowledge of magnetic disk access is skipped, user can refer
to [4] for detail. For the three main operations, insertion, deletion, and searching, both
imperative and functional algorithms are given. They traverse from the root to the leaf.
All the three operations perform in time proportion to the height of the tree. Because
B-tree always maintains the balance properties. The performance is ensured to bound to
O(lg n) time, where n is the number of the keys in B-tree.

Exercise 7.1

• When insert a key, we need find a position, where all keys on the left are less than
it, while all the others on the right are greater than it. Modify the algorithm so
that the elements stored in B-tree only need support less-than and equality test.

• We assume the element being inserted doesn’t exist in the tree. Modify the algo-
rithm so that duplicated elements can be stored in a linked-list.
• Eliminate the recursion in imperative B-tree insertion algorithm.
150 CHAPTER 7. B-TREES
Bibliography

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[2] B-tree, Wikipedia. [Link]
[3] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a Functional Setting”.
J. Functional Programming. 1998

151
152 Binary Heaps
Chapter 8

Binary Heaps

8.1 Introduction
Heaps are one of the most widely used data structures–used to solve practical problems
such as sorting, prioritized scheduling and in implementing graph algorithms, to name a
few[40].
Most popular implementations of heaps use a kind of implicit binary heap using arrays,
which is described in [4]. Examples include C++/STL heap and Python heapq. The most
eﬀicient heap sort algorithm is also realized with binary heap as proposed by R. W. Floyd
[41] [42].
However, heaps can be general and realized with varies of other data structures besides
array. In this chapter, explicit binary tree is used. It leads to Leftist heaps, Skew heaps,
and Splay heaps, which are suitable for purely functional implementation as shown by
Okasaki[3].
A heap is a data structure that satisfies the following heap property.

• Top operation always returns the minimum (maximum) element;

• Pop operation removes the top element from the heap while the heap property
should be kept, so that the new top element is still the minimum (maximum) one;

• Insert a new element to heap should keep the heap property. That the new top is
still the minimum (maximum) element;

• Other operations including merge etc should all keep the heap property.

This is a kind of recursive definition, while it doesn’t limit the under ground data
structure.
We call the heap with the minimum element on top as min-heap, while if the top keeps
the maximum element, we call it max-heap.

8.2 Implicit binary heap by array

Considering the heap definition in previous section, one option to implement heap is by
using trees. A straightforward solution is to store the minimum (maximum) element in
the root of the tree, so for ‘top’ operation, we simply return the root as the result. And
for ‘pop’ operation, we can remove the root and rebuild the tree from the children.
If binary tree is used to implement the heap, we can call it binary heap. This chapter
explains three different realizations for binary heap.

153
154 CHAPTER 8. BINARY HEAPS

8.2.1 Definition
The first one is implicit binary tree. Consider the problem how to represent a complete
binary tree with array. (For example, try to represent a complete binary tree in the
programming language doesn’t support structure or record data type. Only array can be
used). One solution is to pack all elements from top level (root) down to bottom level
(leaves).
Figure 8.1 shows a complete binary tree and its corresponding array representation.

14 10

8 7 9 3

2 4 1

16 14 10 8 7 9 3 2 4 1

Figure 8.1: Mapping between a complete binary tree and array

This mapping between tree and array can be defined as the following equations (The
array index starts from 1).
1: function Parent(i)
2: return b 2i c

3: function Left(i)
4: return 2i

5: function Right(i)
6: return 2i + 1
For a given tree node which is represented as the i-th element of the array, since the
tree is complete, we can easily find its parent node as the bi/2c-th element; Its left child
with index of 2i and right child of 2i + 1. If the index of the child exceeds the length of
the array, it means this node does not have such a child (leaf for example).
In real implementation, this mapping can be calculated fast with bit-wise operation
like the following example ANSI C code. Note that, the array index starts from zero in
C like languages.
#define PARENT(i) ((((i) + 1) >> 1) - 1)

#define LEFT(i) (((i) << 1) + 1)

#define RIGHT(i) (((i) + 1) << 1)

8.2.2 Heapify
The most important thing for heap algorithm is to maintain the heap property, that the
top element should be the minimum (maximum) one.
8.2. IMPLICIT BINARY HEAP BY ARRAY 155

For the implicit binary heap by array, it means for a given node, which is represented
as the i-th index, we can develop a method to check if both its two children are not less
than the parent. In case there is violation, we need swap the parent and child recursively
[4]. Note that here we assume both the two sub-trees are the valid heaps.
Below algorithm shows the iterative solution to enforce the min-heap property from a
given index of the array.
1: function Heapify(A, i)
2: n ← |A|
3: loop
4: l ← Left(i)
5: r ← Right(i)
6: smallest ← i
7: if l < n ∧ A[l] < A[i] then
8: smallest ← l
9: if r < n ∧ A[r] < A[smallest] then
10: smallest ← r
11: if smallest 6= i then
12: Exchange A[i] ↔ A[smallest]
13: i ← smallest
14: else
15: return
For array A and the given index i, None its children should be less than A[i], in case
there is violation, we pick the smallest element as A[i], and swap the previous A[i] to
child. The algorithm traverses the tree top-down to fix the heap property until either
reach a leaf or there is no heap property violation.
The Heapify algorithm takes O(lg n) time, where n is the number of elements. This
is because the loop time is proportion to the height of the complete binary tree.
When implement this algorithm, the comparison method can be passed as a parameter,
so that both min-heap and max-heap can be supported. The following ANSI C example
code uses this approach.

typedef int (∗Less)(Key, Key);

int less(Key x, Key y) { return x < y; }
int notless(Key x, Key y) { return !less(x, y); }

void heapify(Key∗ a, int i, int n, Less lt) {

int l, r, m;
while (1) {
l = LEFT(i);
r = RIGHT(i);
m = i;
if (l < n && lt(a[l], a[i]))
m = l;
if (r < n && lt(a[r], a[m]))
m = r;
if (m ̸= i) {
swap(a, i, m);
i = m;
} else
break;
}
}

Figure 8.2 illustrates the steps when Heapify processing the array {16, 4, 10, 14, 7, 9, 3, 2, 8, 1}
from the second index. The array changes to {16, 14, 10, 8, 7, 9, 3, 2, 4, 1} as a max-heap.
156 CHAPTER 8. BINARY HEAPS

4 10

14 7 9 3

2 8 1

(a) Step 1, 14 is the biggest element among 4, 14, and 7. Swap 4 with
the left child;
16

14 10

4 7 9 3

2 8 1

(b) Step 2, 8 is the biggest element among 2, 4, and 8. Swap 4 with the
right child;
16

14 10

8 7 9 3

2 4 1

(c) 4 is the leaf node. It hasn’t any children. Process terminates.

Figure 8.2: Heapify example, a max-heap case.

8.2. IMPLICIT BINARY HEAP BY ARRAY 157

8.2.3 Build a heap

With Heapify algorithm defined, it is easy to build a heap from an arbitrary array.
Observe that the numbers of nodes in a complete binary tree for each level is a list like
below:
1, 2, 4, 8, ..., 2i , ....
The only exception is the last level. Since the tree may not full (note that complete
binary tree doesn’t mean full binary tree), the last level contains at most 2p−1 nodes,
where 2p + 1 ≤ n and n is the length of the array.
The Heapify algorithm doesn’t have any effect on leave node. We can skip applying
Heapify for all leaves. In other words, all leaf nodes have already satisfied the heap
property. We only need start checking and maintain the heap property from the last
branch node. the index of the last branch node is no greater than bn/2c.
Based on this fact, we can build a heap with the following algorithm. (Assume the
heap is min-heap).
1: function Build-Heap(A)
2: n ← |A|
3: for i ← bn/2c down to 1 do
4: Heapify(A, i)
Although the complexity of Heapify is O(lg n), the running time of Build-Heap is
not bound to O(n lg n) but O(n). It is a linear time algorithm. This can be deduced as
the following:
The heap is built by skipping all leaves. Given n nodes, there are at most n/4 nodes
being compared and moved down 1 time; at most n/8 nodes being compared and moved
down 2 times; at most n/16 nodes being compared and moved down 3 times,... Thus the
upper bound of total comparison and moving time is:
1 1 1
S = n( + 2 + 3 + ...) (8.1)
4 8 16
Times by 2 for both sides, we have:
1 1 1
2S = n( + 2 + 3 + ...) (8.2)
2 4 8
Substract equation (8.1) from (8.2):

1 1 1
S = n( + + + ...) = n
2 4 8
Below ANSI C example program implements this heap building function.
void build_heap(Key∗ a, int n, Less lt) {
int i;
for (i = (n-1) >> 1; i ≥ 0; --i)
heapify(a, i, n, lt);
}

Figure 8.3, 8.4 and 8.5 show the steps when building a max-heap from array {4, 1, 3, 2, 16, 9, 10, 14, 8, 7
The node in black color is the one where Heapify being applied, the nodes in gray color
are swapped in order to keep the heap property.

8.2.4 Basic heap operations

The generic definition of heap (not necessarily the binary heap) demands us to to provide
basic operations for accessing and modifying data.
158 CHAPTER 8. BINARY HEAPS

4 1 3 2 16 9 10 14 8 7

(a) An array in arbitrary order

before heap building process;
4

1 3

2 16 9 10

14 8 7

(b) Step 1, The array is mapped to binary tree. The first branch node,
which is 16 is examined;
4

1 3

2 16 9 10

14 8 7

Figure 8.3: Build a heap from the arbitrary array. Gray nodes are changed in each step,
black node will be processed next step.
8.2. IMPLICIT BINARY HEAP BY ARRAY 159

1 3

14 16 9 10

2 8 7

(a) Step 3, 14 is the largest value in the sub-tree, swap 14 and 2; next
is to check node with value 3;
4

1 10

14 16 9 3

2 8 7

(b) Step 4, 10 is the largest value in the sub-tree, swap 10 and 3; next
is to check node with value 1;

Figure 8.4: Build a heap from the arbitrary array. Gray nodes are changed in each step,
black node will be processed next step.
160 CHAPTER 8. BINARY HEAPS

16 10

14 7 9 3

2 8 1

(a) Step 5, 16 is the largest value in current sub-tree, swap 16 and 1

first; then similarly, swap 1 and 7; next is to check the root node with
value 4;
16

14 10

8 7 9 3

2 4 1

(b) Step 6, Swap 4 and 16, then swap 4 and 14, and then swap 4 and 8;
And the whole build process finish.

Figure 8.5: Build a heap from the arbitrary array. Gray nodes are changed in each step,
black node will be processed next step.
8.2. IMPLICIT BINARY HEAP BY ARRAY 161

The most important operations include accessing the top element (find the minimum
or maximum one), popping the top element from the heap, finding the top k elements,
decreasing a key ( for min-heap. It is increasing a key for max-heap), and insertion.
For the binary tree, most of operations are bound to O(lg n) in worst-case, some of
them, such as top is O(1) constant time.

Access the top element

For the binary tree realization, it is the root stores the minimum (maximum) value. This
is the first element in the array.
1: function Top(A)
2: return A[1]
This operation is trivial. It takes O(1) time. Here we skip the error handling for
empty case. If the heap is empty, one option is to raise an error.

Heap Pop
Pop operation is more complex than accessing the top, because the heap property has to
be maintained after the top element is removed.
The solution is to apply Heapify algorithm to the next element after the root is
removed.
One simple but slow method based on this idea looks like the following.
1: function Pop-Slow(A)
2: x ← Top(A)
3: Remove(A, 1)
4: if A is not empty then
5: Heapify(A, 1)
6: return x
This algorithm firstly records the top element in x, then it removes the first element
from the array, the size of this array is reduced by one. After that if the array isn’t
empty, Heapify will applied to the new array from the first element (It was previously
the second one).
Removing the first element from array takes O(n) time, where n is the length of the
array. This is because we need shift all the rest elements one by one. This bottle neck
slows the whole algorithm to linear time.
In order to solve this problem, one alternative is to swap the first element with the
last one in the array, then shrink the array size by one.
1: function Pop(A)
2: x ← Top(A)
3: n ← Heap-Size(A)
4: Exchange A[1] ↔ A[n]
5: Remove(A, n)
6: if A is not empty then
7: Heapify(A, 1)
8: return x
Removing the last element from the array takes only constant O(1) time, and Heapify
is bound to O(lg n). Thus the whole algorithm performs in O(lg n) time. The following
example ANSI C program implements this algorithm1 .
1 This program does not actually remove the last element, it reuse the last cell to store the popped

result
162 CHAPTER 8. BINARY HEAPS

Key pop(Key∗ a, int n, Less lt) {

swap(a, 0, --n);
heapify(a, 0, n, lt);
return a[n];
}

Find the top k elements

With pop defined, it is easy to find the top k elements from array. we can build a
max-heap from the array, then perform pop operation k times.
1: function Top-k(A, k)
2: R←ϕ
3: Build-Heap(A)
4: for i ← 1 to Min(k, |A|) do
5: Append(R, Pop(A))
6: return R
If k is greater than the length of the array, we need return the whole array as the
result. That’s why it calls the Min function to determine the number of loops.
Below example Python program implements the top-k algorithm.
def top_k(x, k, less_p = MIN_HEAP):
build_heap(x, less_p)
return [heap_pop(x, less_p) for _ in range(min(k, len(x)))]

Decrease key
Heap can be used to implement priority queue. It is important to support key modification
in heap. One typical operation is to increase the priority of a tasks so that it can be
performed earlier.
Here we present the decrease key operation for a min-heap. The corresponding opera-
tion is increase key for max-heap. Figure 8.6 and 8.7 illustrate such a case for a max-heap.
The key of the 9-th node is increased from 4 to 15.
Once a key is decreased in a min-heap, it may make the node conflict with the heap
property, that the key may be less than some ancestor. In order to maintain the invariant,
the following auxiliary algorithm is defined to resume the heap property.
1: function Heap-Fix(A, i)
2: while i > 1 ∧ A[i] < A[ Parent(i) ] do
3: Exchange A[i] ↔ A[ Parent(i) ]
4: i ← Parent(i)
This algorithm repeatedly compares the keys of parent node and current node. It
swap the nodes if the parent contains the smaller key. This process is performed from the
current node towards the root node till it finds that the parent node holds the smaller
key.
With this auxiliary algorithm, decrease key can be realized as below.
1: function Decrease-Key(A, i, k)
2: if k < A[i] then
3: A[i] ← k
4: Heap-Fix(A, i)
This algorithm is only triggered when the new key is less than the original key. The
performance is bound to O(lg n). Below example ANSI C program implements the algo-
rithm.
8.2. IMPLICIT BINARY HEAP BY ARRAY 163

14 10

8 7 9 3

2 4 1

(a) The 9-th node with key 4 will be modified;

14 10

8 7 9 3

2 15 1

(b) The key is modified to 15, which is greater than its parent;
16

14 10

15 7 9 3

2 8 1

(c) According the max-heap property, 8 and 15 are swapped.

Figure 8.6: Example process when increase a key in a max-heap.

15 10

14 7 9 3

2 8 1

(a) Since 15 is greater than its parent 14, they are swapped. After that,
because 15 is less than 16, the process terminates.

Figure 8.7: Example process when increase a key in a max-heap.

164 CHAPTER 8. BINARY HEAPS

void heap_fix(Key∗ a, int i, Less lt) {

while (i > 0 && lt(a[i], a[PARENT(i)])) {
swap(a, i, PARENT(i));
i = PARENT(i);
}
}

void decrease_key(Key∗ a, int i, Key k, Less lt) {

if (lt(k, a[i])) {
a[i] = k;
heap_fix(a, i, lt);
}
}

Insertion
Insertion can be implemented by using Decrease-Key [4]. A new node with ∞ as key
is created. According to the min-heap property, it should be the last element in the
under ground array. After that, the key is decreased to the value to be inserted, and
Decrease-Key is called to fix any violation to the heap property.
Alternatively, we can reuse Heap-Fix to implement insertion. The new key is directly
appended at the end of the array, and the Heap-Fix is applied to this new node.
1: function Heap-Push(A, k)
2: Append(A, k)
3: Heap-Fix(A, |A|)
The following example Python program implements the heap insertion algorithm.
def heap_insert(x, key, less_p = MIN_HEAP):
i = len(x)
[Link](key)
heap_fix(x, i, less_p)

8.2.5 Heap sort

Heap sort is interesting application of heap. According to the heap property, the min(max)
element can be easily accessed by from the top of the heap. A straightforward way to sort
a list of values is to build a heap from them, then continuously pop the smallest element
till the heap is empty.
The algorithm based on this idea can be defined like below.
1: function Heap-Sort(A)
2: R←ϕ
3: Build-Heap(A)
4: while A 6= ϕ do
5: Append(R, Heap-Pop(A))
6: return R
The following Python example program implements this definition.
def heap_sort(x, less_p = MIN_HEAP):
res = []
build_heap(x, less_p)
while x ̸= []:
[Link](heap_pop(x, less_p))
return res
8.2. IMPLICIT BINARY HEAP BY ARRAY 165

When sort n elements, the Build-Heap is bound to O(n). Since pop is O(lg n), and
it is called n times, so the overall sorting takes O(n lg n) time to run. Because we use
another list to hold the result, the space requirement is O(n).
Robert. W. Floyd found a fast implementation of heap sort. The idea is to build a
max-heap instead of min-heap, so the first element is the biggest one. Then the biggest
element is swapped with the last element in the array, so that it is in the right position
after sorting. As the last element becomes the new top, it may violate the heap property.
We can shrink the heap size by one and perform Heapify to resume the heap property.
This process is repeated till there is only one element left in the heap.
1: function Heap-Sort(A)
2: Build-Max-Heap(A)
3: while |A| > 1 do
4: Exchange A[1] ↔ A[n]
5: |A| ← |A| − 1
6: Heapify(A, 1)
This is in-place algorithm, it needn’t any extra spaces to hold the result. The following
ANSI C example code implements this algorithm.
void heap_sort(Key∗ a, int n) {
build_heap(a, n, notless);
while(n > 1) {
swap(a, 0, --n);
heapify(a, 0, n, notless);
}
}

Exercise 8.1

• Somebody considers one alternative to realize in-place heap sort. Take sorting the
array in ascending order as example, the first step is to build the array as a minimum
heap A, but not the maximum heap like the Floyd’s method. After that the first
element a1 is in the correct place. Next, treat the rest {a2 , a3 , ..., an } as a new heap,
and perform Heapify to them from a2 for these n − 1 elements. Repeating this
advance and Heapify step from left to right would sort the array. The following
example ANSI C code illustrates this idea. Is this solution correct? If yes, prove it;
if not, why?
void heap_sort(Key∗ a, int n) {
build_heap(a, n, less);
while(--n)
heapify(++a, 0, n, less);
}

• Because of the same reason, can we perform Heapify from left to right k times to
realize in-place top-k algorithm like below ANSI C code?
int tops(int k, Key∗ a, int n, Less lt) {
build_heap(a, n, lt);
for (k = MIN(k, n) - 1; k; --k)
heapify(++a, 0, --n, lt);
return k;
}
166 CHAPTER 8. BINARY HEAPS

8.3 Leftist heap and Skew heap, the explicit binary

heaps
Instead of using implicit binary tree by array, it is natural to consider why we can’t use
explicit binary tree to realize heap?
There are some problems must be solved if we turn into explicit binary tree as the
under ground data structure.
The first problem is about the Heap-Pop or Delete-Min operation. Consider the
binary tree is represented in form of left, key, and right as (L, k, R), which is shown in
figure 8.8

L R

Figure 8.8: A binary tree, all elements in children are not less than k.

If k is the top element, all elements in left and right children are not less than k in
a min-heap. After k is popped, only left and right children are left. They have to be
merged to a new tree. Since heap property should be maintained after merge, the new
root is still the smallest element.
Because both left and right children are binary trees conforming heap property, the
two trivial cases can be defined immediately.

 H2 : H1 = ϕ
merge(H1 , H2 ) = H1 : H2 = ϕ

? : otherwise

Where ϕ means empty heap.

If neither left nor right child is empty, because they all fit heap property, the top
elements of them are all the minimum respectively. We can compare these two roots, and
select the smaller as the new root of the merged heap.
For instance, let L = (A, x, B) and R = (A′ , y, B ′ ), where A, A′ , B, and B ′ are all
sub trees. If x < y, x will be the new root. We can either keep A, and recursively merge
B and R; or keep B, and merge A and R, so the new heap can be one of the following.

• (merge(A, R), x, B)

• (A, x, merge(B, R))

Both are correct. One simplified solution is to only merge the right sub tree. Leftist
tree provides a systematically approach based on this idea.
8.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS 167

8.3.1 Definition
The heap implemented by Leftist tree is called Leftist heap. Leftist tree is first introduced
by C. A. Crane in 1972[43].

Rank (S-value)
In Leftist tree, a rank value (or S value) is defined for each node. Rank is the distance
to the nearest external node. Where external node is the NIL concept extended from the
leaf node.
For example, in figure 8.9, the rank of NIL is defined 0, consider the root node 4, The
nearest external node is the child of node 8. So the rank of root node 4 is 2. Because
node 6 and node 8 both only contain NIL, so their rank values are 1. Although node 5
has non-NIL left child, However, since the right child is NIL, so the rank value, which is
the minimum distance to NIL is still 1.

5 8

6 NIL NIL NIL

NIL NIL

Figure 8.9: rank(4) = 2, rank(6) = rank(8) = rank(5) = 1.

Leftist property
With rank defined, we can create a strategy when merging.

• Every time when merging, we always merge to right child; Denote the rank of the
new right sub tree as rr ;
• Compare the ranks of the left and right children, if the rank of left sub tree is rl
and rl < rr , we swap the left and the right children.

We call this ‘Leftist property’. In general, a Leftist tree always has the shortest path
to some external node on the right.
Leftist tree tends to be very unbalanced, However, it ensures important property as
specified in the following theorem.
Theorem 8.3.1. If a Leftist tree T contains n internal nodes, the path from root to the
rightmost external node contains at most blog(n + 1)c nodes.
We skip the proof here, readers can refer to [44] and [51] for more information. With
this theorem, algorithms operate along this path are all bound to O(lg n).
We can reuse the binary tree definition, and augment with a rank field to define the
Leftist tree, for example in form of (r, k, L, R) for non-empty case. Below Haskell code
defines the Leftist tree.
168 CHAPTER 8. BINARY HEAPS

data LHeap a = E −− Empty

| Node Int a (LHeap a) (LHeap a) −− rank, element, left, right

For empty tree, the rank is defined as zero. Otherwise, it’s the value of the augmented
field. A rank(H) function can be given to cover both cases.
{
0 : H=ϕ
rank(H) = (8.3)
r : otherwise, H = (r, k, L, R)

Here is the example Haskell rank function.

rank E = 0
rank (Node r _ _ _) = r

In the rest of this section, we denote rank(H) as rH

8.3.2 Merge
In order to realize ‘merge’, we need develop the auxiliary algorithm to compare the ranks
and swap the children if necessary.
{
(rA + 1, k, B, A) : rA < rB
mk(k, A, B) = (8.4)
(rB + 1, k, A, B) : otherwise

This function takes three arguments, a key and two sub trees A, and B. if the rank
of A is smaller, it builds a bigger tree with B as the left child, and A as the right child.
It increment the rank of A by 1 as the rank of the new tree; Otherwise if B holds the
smaller rank, then A is set as the left child, and B becomes the right. The resulting rank
is rb + 1.
The reason why rank need be increased by one is because there is a new key added
on top of the tree. It causes the rank increasing.
Denote the key, the left and right children for H1 and H2 as k1 , L1 , R1 , and k2 , L2 , R2
respectively. The merge(H1 , H2 ) function can be completed by using this auxiliary tool
as below


 H2 : H1 = ϕ

H1 : H2 = ϕ
merge(H1 , H2 ) = (8.5)

 mk(k1 , L1 , merge(R1 , H2 )) : k1 < k2

mk(k2 , L2 , merge(H1 , R2 )) : otherwise

The merge function is always recursively called on the right side, and the Leftist
property is maintained. These facts ensure the performance being bound to O(lg n).
The following Haskell example code implements the merge program.
merge E h = h
merge h E = h
merge h1@(Node _ x l r) h2@(Node _ y l' r') =
if x < y then makeNode x l (merge r h2)
else makeNode y l' (merge h1 r')

makeNode x a b = if rank a < rank b then Node (rank a + 1) x b a

else Node (rank b + 1) x a b
8.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS 169

Merge operation in implicit binary heap by array

Implicit binary heap by array performs very fast in most cases, and it fits modern com-
puter with cache technology well. However, merge is the algorithm bounds to O(n) time.
The typical realization is to concatenate two arrays together and make a heap for this
array [50].
1: function Merge-Heap(A, B)
2: C ← Concat(A, B)
3: Build-Heap(C)

8.3.3 Basic heap operations

Most of the basic heap operations can be implemented with merge algorithm defined
above.

Top and pop

Because the smallest element is always held in root, it’s trivial to find the minimum
value. It’s constant O(1) operation. Below equation extracts the root from non-empty
heap H = (r, k, L, R). The error handling for empty case is skipped here.

top(H) = k (8.6)

For pop operation, firstly, the top element is removed, then left and right children are
merged to a new heap.

pop(H) = merge(L, R) (8.7)

Because it calls merge directly, the pop operation on Leftist heap is bound to O(lg n).

Insertion
To insert a new element, one solution is to create a single leaf node with the element, and
then merge this leaf node to the existing Leftist tree.

insert(H, k) = merge(H, (1, k, ϕ, ϕ)) (8.8)

It is O(lg n) algorithm since insertion also calls merge directly.

There is a convenient way to build the Leftist heap from a list. We can continuously
insert the elements one by one to the empty heap. This can be realized by folding.

build(L) = f old(insert, ϕ, L) (8.9)

Figure 8.10 shows one example Leftist tree built in this way.
The following example Haskell code gives reference implementation for the Leftist tree
operations.
insert h x = merge (Node 1 x E E) h

findMin (Node _ x _ _) = x

deleteMin (Node _ _ l r) = merge l r

fromList = foldl insert E

170 CHAPTER 8. BINARY HEAPS

4 3

7 9 14 8

16 10

Figure 8.10: A Leftist tree built from list {9, 4, 16, 7, 10, 2, 14, 3, 8, 1}.

8.3.4 Heap sort by Leftist Heap

With all the basic operations defined, it’s straightforward to implement heap sort. We can
firstly turn the list into a Leftist heap, then continuously extract the minimum element
from it.

sort(L) = heapSort(build(L)) (8.10)

{
ϕ : H=ϕ
heapSort(H) = (8.11)
{top(H)} ∪ heapSort(pop(H)) : otherwise

Because pop is logarithm operation, and it is recursively called n times, this algorithm
takes O(n lg n) time in total. The following Haskell example program implements heap
sort with Leftist tree.
heapSort = hsort ◦ fromList where
hsort E = []
hsort h = (findMin h):(hsort $ deleteMin h)

8.3.5 Skew heaps

Leftist heap leads to quite unbalanced structure sometimes. Figure 8.11 shows one ex-
ample. The Leftist tree is built by folding on list {16, 14, 10, 8, 7, 9, 3, 2, 4, 1}.
Skew heap (or self-adjusting heap) simplifies Leftist heap realization and intends to
solve the balance issue[46] [47].
When construct the Leftist heap, we swap the left and right children during merge
if the rank on left side is less than the right side. This comparison-and-swap strategy
doesn’t work when either sub tree has only one child. Because in such case, the rank of
the sub tree is always 1 no matter how big it is. A ‘Brute-force’ approach is to swap the
left and right children every time when merge. This idea leads to Skew heap.

Definition of Skew heap

Skew heap is the heap realized with Skew tree. Skew tree is a special binary tree. The
minimum element is stored in root. Every sub tree is also a skew tree.
8.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS 171

3 4

8 9

Figure 8.11: A very unbalanced Leftist tree build from list {16, 14, 10, 8, 7, 9, 3, 2, 4, 1}.

It needn’t keep the rank (or S-value) field. We can reuse the binary tree definition
for Skew heap. The tree is either empty, or in a pre-order form (k, L, R). Below Haskell
code defines Skew heap like this.
data SHeap a = E −− Empty
| Node a (SHeap a) (SHeap a) −− element, left, right

Merge
The merge algorithm tends to be very simple. When merge two non-empty Skew trees, we
compare the roots, and pick the smaller one as the new root, then the other tree contains
the bigger element is merged onto one sub tree, finally, the tow children are swapped.
Denote H1 = (k1 , L1 , R1 ) and H2 = (k2 , L2 , R2 ) if they are not empty. if k1 < k2 for
instance, select k1 as the new root. We can either merge H2 to L1 , or merge H2 to R1 .
Without loss of generality, let’s merge to R1 . And after swapping the two children, the
final result is (k1 , merge(R1 , H2 ), L1 ). Take account of edge cases, the merge algorithm
is defined as the following.


 H1 : H2 = ϕ

H2 : H1 = ϕ
merge(H1 , H2 ) = (8.12)

 (k1 , merge(R1 , H2 ), L1 ) : k1 < k2

(k2 , merge(H1 , R2 ), L2 ) : otherwise

All the rest operations, including insert, top and pop are all realized as same as the
Leftist heap by using merge, except that we needn’t the rank any more.
Translating the above algorithm into Haskell yields the following example program.
merge E h = h
merge h E = h
merge h1@(Node x l r) h2@(Node y l' r') =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r') l'

insert h x = merge (Node x E E) h

findMin (Node x _ _) = x
172 CHAPTER 8. BINARY HEAPS

deleteMin (Node _ l r) = merge l r

Different from the Leftist heap, if we feed ordered list to Skew heap, it can build a
fairly balanced binary tree as illustrated in figure 8.12.

4 3

7 9 14 8

16 10

Figure 8.12: Skew tree is still balanced even the input is an ordered list {1, 2, ..., 10}.

8.4 Splay heap

The Leftist heap and Skew heap show the fact that it’s quite possible to realize heap data
structure with explicit binary tree. Skew heap gives one method to solve the tree balance
problem. Splay heap on the other hand, use another method to keep the tree balanced.
The binary trees used in Leftist heap and Skew heap are not Binary Search tree
(BST). If we turn the underground data structure to binary search tree, the minimum(or
maximum) element is not root any more. It takes O(lg n) time to find the minimum(or
maximum) element.
Binary search tree becomes ineﬀicient if it isn’t well balanced. Most operations degrade
to O(n) in the worst case. Although red-black tree can be used to realize binary heap,
it’s overkill. Splay tree provides a light weight implementation with acceptable dynamic
balancing result.

8.4.1 Definition
Splay tree uses cache-like approach. It keeps rotating the current access node close to the
top, so that the node can be accessed fast next time. It defines such kinds of operation
as “Splay”. For the unbalanced binary search tree, after several splay operations, the tree
tends to be more and more balanced. Most basic operations of Splay tree perform in
amortized O(lg n) time. Splay tree was invented by Daniel Dominic Sleator and Robert
Endre Tarjan in 1985[48] [49].

Splaying
There are two methods to do splaying. The first one need deal with many different
cases, but can be implemented fairly easy with pattern matching. The second one has a
uniformed form, but the implementation is complex.
8.4. SPLAY HEAP 173

Denote the node currently being accessed as X, the parent node as P , and the grand
parent node as G (If there are). There are 3 steps for splaying. Each step contains 2
symmetric cases. For illustration purpose, only one case is shown for each step.

• Zig-zig step. As shown in figure 8.13, in this case, X and P are children on the
same side of G, either both on left or right. By rotating 2 times, X becomes the
new root.

G X

P d a p

X c b g

a b c d

(a) X and P are both left children or both right (b) X becomes new root after rotating 2 times.
children.

Figure 8.13: Zig-zig case.

• Zig-zag step. As shown in figure 8.14, in this case, X and P are children on different
sides. X is on the left, P is on the right. Or X is on the right, P is on the left.
After rotation, X becomes the new root, P and G are siblings.

• Zig step. As shown in figure 8.15, in this case, P is the root, we rotate the tree, so
that X becomes new root. This is the last step in splay operation.

Although there are 6 different cases, they can be handled in the environments support
pattern matching. Denote the non-empty binary tree in form T = (L, k, R),. when access
key Y in tree T , the splay operation can be defined as below.


 (a, X, (b, P, (c, G, d))) : T = (((a, X, b), P, c), G, d), X = Y



 (((a, G, b), P, c), X, d) : T = (a, G, (b, P, (c, X, d))), X = Y


 ((a, P, b), X, (c, G, d)) : T = (a, P, (b, X, c), G, d), X = Y
splay(T, X) = ((a, G, b), X, (c, P, d)) : T = (a, G, ((b, X, c), P, d)), X = Y (8.13)



 (a, X, (b, P, c)) : T = ((a, X, b), P, c), X = Y



 ((a, P, b), X, c) : T = (a, P, (b, X, c)), X = Y

T : otherwise

The first two clauses handle the ’zig-zig’ cases; the next two clauses handle the ’zig-
zag’ cases; the last two clauses handle the zig cases. The tree aren’t changed for all other
situations.
The following Haskell program implements this splay function.
data STree a = E −− Empty
| Node (STree a) a (STree a) −− left, key, right
−− zig-zig
splay t@(Node (Node (Node a x b) p c) g d) y =
if x == y then Node a x (Node b p (Node c g d)) else t
174 CHAPTER 8. BINARY HEAPS

P d

a X

b c

(a) X and P are children on different

sides.
X

P G

a b c d

(b) X becomes new root. P and G are siblings.

Figure 8.14: Zig-zag case.

P X

X c a P

a b b c

(a) P is the root. (b) Rotate the tree to make X be new

root.

Figure 8.15: Zig case.

8.4. SPLAY HEAP 175

splay t@(Node a g (Node b p (Node c x d))) y =

if x == y then Node (Node (Node a g b) p c) x d else t
−− zig-zag
splay t@(Node (Node a p (Node b x c)) g d) y =
if x == y then Node (Node a p b) x (Node c g d) else t
splay t@(Node a g (Node (Node b x c) p d)) y =
if x == y then Node (Node a g b) x (Node c p d) else t
−− zig
splay t@(Node (Node a x b) p c) y = if x == y then Node a x (Node b p c) else t
splay t@(Node a p (Node b x c)) y = if x == y then Node (Node a p b) x c else t
−− otherwise
splay t _ = t

With splay operation defined, every time when insert a new key, we call the splay
function to adjust the tree. If the tree is empty, the result is a leaf; otherwise we compare
this key with the root, if it is less than the root, we recursively insert it into the left child,
and perform splaying after that; else the key is inserted into the right child.

 (ϕ, x, ϕ) : T = ϕ
insert(T, x) = splay((insert(L, x), k, R), x) : T = (L, k, R), x < k (8.14)

splay(L, k, insert(R, x)) : otherwise

The following Haskell program implements this insertion algorithm.

insert E y = Node E y E
insert (Node l x r) y
| x > y = splay (Node (insert l y) x r) y
| otherwise = splay (Node l x (insert r y)) y

Figure 8.16 shows the result of using this function. It inserts the ordered elements
{1, 2, ..., 10} one by one to the empty tree. This would build a very poor result which
downgrade to linked-list with normal binary search tree. The splay method creates more
balanced result.

4 10

2 9

1 3 7

6 8

Figure 8.16: Splaying helps improving the balance.

Okasaki found a simple rule for Splaying [3]. Whenever we follow two left branches,
or two right branches continuously, we rotate the two nodes.
Based on this rule, splaying can be realized in such a way. When we access node for
a key x (can be during the process of inserting a node, or looking up a node, or deleting
a node), if we traverse two left branches or two right branches, we partition the tree in
two parts L and R, where L contains all nodes smaller than x, and R contains all the
176 CHAPTER 8. BINARY HEAPS

rest. We can then create a new tree (for instance in insertion), with x as the root, L as
the left child, and R being the right child. The partition process is recursive, because it
will splay its children as well.


 (ϕ, ϕ) : T = ϕ



 (T, ϕ) : T = (L, k, R) ∧ R = ϕ





 ′ ′ ′

 ((L, k, L′ ), k ′ , A, B) : T = (L,′k, (L , k , R ))



 k < p, k < p



 (A, B) = partition(R′ , p)







 T = (L, K, (L′ , k ′ , R′ ))




′
((L, k, A), (B, k , R )) : ′
k < p ≤ k′

(A, B) = partition(L′ , p)
partition(T, p) = (8.15)





 (ϕ, T ) : T = (L, k, R) ∧ L = ϕ







 T = ((L′ , k ′ , R′ ), k, R)

 (A, (L′ , k ′ , (R′ , k, R)) :

 p ≤ k, p ≤ k ′



 (A, B) = partition(L′ , p)







 T = ((L′ , k ′ , R′ ), k, R)

 ((L′ , k ′ , A), (B, k, R)) :

 k′ ≤ p ≤ k

(A, B) = partition(R′ , p)

Function partition(T, p) takes a tree T , and a pivot p as arguments. The first clause is
edge case. The partition result for empty is a pair of empty left and right trees. Otherwise,
denote the tree as (L, k, R). we need compare the pivot p and the root k. If k < p, there
are two sub-cases. one is trivial case that R is empty. According to the property of binary
search tree, All elements are less than p, so the result pair is (T, ϕ); For the other case,
R = (L′ , k ′ , R′ ), we need further compare k ′ with the pivot p. If k ′ < p is also true, we
recursively partition R′ with the pivot, all the elements less than p in R′ is held in tree
A, and the rest is in tree B. The result pair can be composed with two trees, one is
((L, k, L′ ), k ′ , A); the other is B. If the key of the right sub tree is not less than the pivot,
we recursively partition L′ with the pivot to give the intermediate pair (A, B), the final
pair trees can be composed with (L, k, A) and (B, k ′ , R′ ). There are symmetric cases for
p ≤ k. They are handled in the last three clauses.
Translating the above algorithm into Haskell yields the following partition program.
partition E _ = (E, E)
partition t@(Node l x r) y
| x < y =
case r of
E → (t, E)
Node l' x' r' →
if x' < y then
let (small, big) = partition r' y in
(Node (Node l x l') x' small, big)
else
let (small, big) = partition l' y in
(Node l x small, Node big x' r')
| otherwise =
case l of
E → (E, t)
Node l' x' r' →
if y < x' then
let (small, big) = partition l' y in
8.4. SPLAY HEAP 177

(small, Node l' x' (Node r' x r))

else
let (small, big) = partition r' y in
(Node l' x' small, Node big x r)

Alternatively, insertion can be realized with partition algorithm. When insert a new
element k into the splay heap T , we can first partition the heap into two trees, L and R.
Where L contains all nodes smaller than k, and R contains the rest. We then construct
a new node, with k as the root and L, R as the children.
insert(T, k) = (L, k, R), (L, R) = partition(T, k) (8.16)
The corresponding Haskell example program is as the following.
insert t x = Node small x big where (small, big) = partition t x

Top and pop

Since splay tree is just a special binary search tree, the minimum element is stored in
the left most node. We need keep traversing the left child to realize the top operation.
Denote the none empty tree T = (L, k, R), the top(T ) function can be defined as below.
{
k : L=ϕ
top(T ) = (8.17)
top(L) : otherwise
This is exactly the min(T ) algorithm for binary search tree.
For pop operation, the algorithm need remove the minimum element from the tree.
Whenever there are two left nodes traversed, the splaying operation should be performed.

 R : T = (ϕ, k, R)
pop(T ) = (R′ , k, R) : T = ((ϕ, k ′ , R′ ), k, R) (8.18)

(pop(L′ ), k ′ , (R′ , k, R)) : T = ((L′ , k ′ , R′ ), k, R)
Note that the third clause performs splaying without explicitly call the partition
function. It utilizes the property of binary search tree directly.
Both the top and pop algorithms are bound to O(lg n) time because the splay tree is
balanced.
The following Haskell example programs implement the top and pop operations.
findMin (Node E x _) = x
findMin (Node l x _) = findMin l

deleteMin (Node E x r) = r
deleteMin (Node (Node E x' r') x r) = Node r' x r
deleteMin (Node (Node l' x' r') x r) = Node (deleteMin l') x' (Node r' x r)

Merge
Merge is another basic operation for heaps as it is widely used in Graph algorithms. By
using the partition algorithm, merge can be realized in O(lg n) time.
When merging two splay trees, for non-trivial case, we can take the root of the first
tree as the new root, then partition the second tree with this new root as the pivot.
After that we recursively merge the children of the first tree to the partition result. This
algorithm is defined as the following.
{
T2 : T1 = ϕ
merge(T1 , T2 ) =
(merge(L, A), k, merge(R, B)) : T1 = (L, k, R), (A, B) = partition(T2 , k)
(8.19)
178 CHAPTER 8. BINARY HEAPS

If the first heap is empty, the result is definitely the second heap. Otherwise, denote
the first splay heap as (L, k, R), we partition T2 with k as the pivot to yield (A, B), where
A contains all the elements in T2 which are less than k, and B holds the rest. We next
recursively merge A with L; and merge B with R as the new children for T1 .
Translating the definition to Haskell gives the following example program.
merge E t = t
merge (Node l x r) t = Node (merge l l') x (merge r r')
where (l', r') = partition t x

8.4.2 Heap sort

Since the internal implementation of the Splay heap is completely transparent to the heap
interface, the heap sort algorithm can be reused. It means that the heap sort algorithm
is generic no matter what the underground data structure is.

8.5 Notes and short summary

In this chapter, we define binary heap more general so that as long as the heap property is
maintained, all binary representation of data structures can be used to implement binary
heap.
This definition doesn’t limit to the popular array based binary heap, but also extends
to the explicit binary heaps including Leftist heap, Skew heap and Splay heap. The array
based binary heap is particularly convenient for the imperative implementation because
it intensely uses random index access which can be mapped to a completely binary tree.
It’s hard to find directly functional counterpart in this way.
However, by using explicit binary tree, functional implementation can be achieved,
most of them have O(lg n) worst case performance, and some of them even reach O(1)
amortize time. Okasaki in [3] shows detailed analysis of these data structures.
In this chapter, only purely functional realization for Leftist heap, Skew heap, and
Splay heap are explained, they can all be realized in imperative approaches.
It’s very natural to extend the concept from binary tree to k-ary (k-way) tree, which
leads to other useful heaps such as Binomial heap, Fibonacci heap and pairing heap. They
are introduced in the following chapters.

Exercise 8.2

• Realize the imperative Leftist heap, Skew heap, and Splay heap.
Bibliography

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[2] Heap (data structure), Wikipedia. [Link]
[3] Heapsort, Wikipedia. [Link]
[4] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[5] Sorting algorithms/Heapsort. Rosetta Code. [Link]
[6] Leftist Tree, Wikipedia. [Link]
[7] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented Design Pat-
terns in Java. [Link]
[8] Donald E. Knuth. “The Art of Computer Programming. Volume 3: Sorting and
Searching.”. Addison-Wesley Professional; 2nd Edition (October 15, 1998). ISBN-13:
978-0201485417. Section 5.2.3 and 6.2.3

[9] Skew heap, Wikipedia. [Link]

[10] Sleator, Daniel Dominic; Jarjan, Robert Endre. “Self-adjusting heaps” SIAM Journal
on Computing 15(1):52-69. doi:10.1137/0215004 ISSN 00975397 (1986)
[11] Splay tree, Wikipedia. [Link]

[12] Sleator, Daniel D.; Tarjan, Robert E. (1985), “Self-Adjusting Binary Search Trees”,
Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835
[13] NIST, “binary heap”. [Link]

179
180 The evolution of selection sort
Chapter 9

From grape to the world cup,

the evolution of selection sort

9.1 Introduction
We have introduced the ‘hello world’ sorting algorithm, insertion sort. In this short
chapter, we explain another straightforward sorting method, selection sort. The basic
version of selection sort doesn’t perform as good as the divide and conqueror methods,
e.g. quick sort and merge sort. We’ll use the same approaches in the chapter of insertion
sort, to analyze why it’s slow, and try to improve it by varies of attempts till reach the
best bound of comparison based sorting, O(n lg n), by evolving to heap sort.
The idea of selection sort can be illustrated by a real life story. Consider a kid eating
a bunch of grapes. There are two types of children according to my observation. One is
optimistic type, that the kid always eats the biggest grape he/she can ever find; the other
is pessimistic, that he/she always eats the smallest one.
The first type of kids actually eat the grape in an order that the size decreases mono-
tonically; while the other eat in a increase order. The kid sorts the grapes in order of size
in fact, and the method used here is selection sort.

Figure 9.1: Always picking the smallest grape.

Based on this idea, the algorithm of selection sort can be directly described as the
following.
In order to sort a series of elements:

181
182CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

• The trivial case, if the series is empty, then we are done, the result is also empty;

• Otherwise, we find the smallest element, and append it to the tail of the result;

Note that this algorithm sorts the elements in increase order; It’s easy to sort in
decrease order by picking the biggest element instead; We’ll introduce about passing a
comparator as a parameter later on.
This description can be formalized to a equation.
{
ϕ : A=ϕ
sort(A) = (9.1)
{m} ∪ sort(A′ ) : otherwise

Where m is the minimum element among collection A, and A′ is all the rest elements
except m:

m = min(A)
A′ = A − {m}

We don’t limit the data structure of the collection here. Typically, A is an array in
imperative environment, and a list (singly linked-list particularly) in functional environ-
ment, and it can even be other data struture which will be introduced later.
The algorithm can also be given in imperative manner.
function Sort(A)
X←ϕ
while A 6= ϕ do
x ← Min(A)
A ← Del(A, x)
X ← Append(X, x)
return X
Figure 9.2 depicts the process of this algorithm.

pick

... sorted elements ... min ... unsorted elements ...

Figure 9.2: The left part is sorted data, continuously pick the minimum element in the
rest and append it to the result.

We just translate the very original idea of ‘eating grapes’ line by line without consid-
ering any expense of time and space. This realization stores the result in X, and when an
selected element is appended to X, we delete the same element from A. This indicates
that we can change it to ‘in-place’ sorting to reuse the spaces in A.
The idea is to store the minimum element in the first cell in A (we use term ‘cell’ if A
is an array, and say ‘node’ if A is a list); then store the second minimum element in the
next cell, then the third cell, ...
One solution to realize this sorting strategy is swapping. When we select the i-th
minimum element, we swap it with the element in the i-th cell:
function Sort(A)
for i ← 1 to |A| do
m ← Min(A[i...])
Exchange A[i] ↔ m
9.2. FINDING THE MINIMUM 183

Denote A = {a1 , a2 , ..., an }. At any time, when we process the i-th element, all
elements before i, as {a1 , a2 , ..., ai−1 } have already been sorted. We locate the minimum
element among the {ai , ai+1 , ..., an }, and exchange it with ai , so that the i-th cell contains
the right value. The process is repeatedly executed until we arrived at the last element.
This idea can be illustrated by figure 9.3.

swap

... sorted elements ... x ... min ...

Figure 9.3: The left part is sorted data, continuously pick the minimum element in the
rest and put it to the right position.

9.2 Finding the minimum

We haven’t completely realized the selection sort, because we take the operation of finding
the minimum (or the maximum) element as a black box. It’s a puzzle how does a kid
locate the biggest or the smallest grape. And this is an interesting topic for computer
algorithms.
The easiest but not so fast way to find the minimum in a collection is to perform a
scan. There are several ways to interpret this scan process. Consider that we want to
pick the biggest grape. We start from any grape, compare it with another one, and pick
the bigger one; then we take a next grape and compare it with the one we selected so
far, pick the bigger one and go on the take-and-compare process, until there are not any
grapes we haven’t compared.
It’s easy to get loss in real practice if we don’t mark which grape has been com-
pared. There are two ways to to solve this problem, which are suitable for different
data-structures respectively.

9.2.1 Labeling
Method 1 is to label each grape with a number: {1, 2, ..., n}, and we systematically perform
the comparison in the order of this sequence of labels. That we first compare grape number
1 and grape number 2, pick the bigger one; then we take grape number 3, and do the
comparison, ... We repeat this process until arrive at grape number n. This is quite
suitable for elements stored in an array.
function Min(A)
m ← A[1]
for i ← 2 to |A| do
if A[i] < m then
m ← A[i]
return m
With Min defined, we can complete the basic version of selection sort (or naive version
without any optimization in terms of time and space).
However, this algorithm returns the value of the minimum element instead of its
location (or the label of the grape), which needs a bit tweaking for the in-place version.
Some languages such as ISO C++, support returning the reference as result, so that the
swap can be achieved directly as below.
184CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

template<typename T>
T& min(T∗ from, T∗ to) {
T∗ m;
for (m = from++; from ̸= to; ++from)
if (∗from < ∗m)
m = from;
return ∗m;
}

template<typename T>
void ssort(T∗ xs, int n) {
for (int i = 0; i < n; ++i)
std::swap(xs[i], min(xs+i, xs+n));
}

In environments without reference semantics, the solution is to return the location of

the minimum element instead of the value:
function Min-At(A)
m ← First-Index(A)
for i ← m + 1 to |A| do
if A[i] < A[m] then
m←i
return m
Note that since we pass A[i...] to Min-At as the argument, we assume the first element
A[i] as the smallest one, and examine all elements A[i+1], A[i+2], ... one by one. Function
First-Index() is used to retrieve i from the input parameter.
The following Python example program, for example, completes the basic in-place
selection sort algorithm based on this idea. It explicitly passes the range information to
the function of finding the minimum location.
def ssort(xs):
n = len(xs)
for i in range(n):
m = min_at(xs, i, n)
(xs[i], xs[m]) = (xs[m], xs[i])
return xs

def min_at(xs, i, n):

m = i;
for j in range(i+1, n):
if xs[j] < xs[m]:
m = j
return m

9.2.2 Grouping
Another method is to group all grapes in two parts: the group we have examined, and
the rest we haven’t. We denote these two groups as A and B; All the elements (grapes)
as L. At the beginning, we haven’t examine any grapes at all, thus A is empty (ϕ),
and B contains all grapes. We can select arbitrary two grapes from B, compare them,
and put the loser (the smaller one for example) to A. After that, we repeat this process
by continuously picking arbitrary grapes from B, and compare with the winner of the
previous time until B becomes empty. At this time being, the final winner is the minimum
element. And A turns to be L − {min(L)}, which can be used for the next time minimum
finding.
There is an invariant of this method, that at any time, we have L = A ∪ {m} ∪ B,
where m is the winner so far we hold.
9.2. FINDING THE MINIMUM 185

This approach doesn’t need the collection of grapes being indexed (as being labeled
in method 1). It’s suitable for any traversable data structures, including linked-list etc.
Suppose b1 is an arbitrary element in B if B isn’t empty, and B ′ is the rest of elements
with b1 being removed, this method can be formalized as the below auxiliary function.

 (m, A) : B = ϕ
min′ (A, m, B) = min′ (A ∪ {m}, b1 , B ′ ) : b1 < m (9.2)

min′ (A ∪ {b1 }, m, B ′ ) : otherwise

In order to pick the minimum element, we call this auxiliary function by passing an
empty A, and use an arbitrary element (for instance, the first one) to initialize m:

extractM in(L) = min′ (ϕ, l1 , L′ ) (9.3)

Where L′ is all elements in L except for the first one l1 . The algorithm extractM in
doesn’t not only find the minimum element, but also returns the updated collection which
doesn’t contain this minimum. Summarize this minimum extracting algorithm up to the
basic selection sort definition, we can create a complete functional sorting program, for
example as this Haskell code snippet.
sort [] = []
sort xs = x : sort xs' where
(x, xs') = extractMin xs

extractMin (x:xs) = min' [] x xs where

min' ys m [] = (m, ys)
min' ys m (x:xs) = if m < x then min' (x:ys) m xs else min' (m:ys) x xs

The first line handles the trivial edge case that the sorting result for empty list is
obvious empty; The second clause ensures that, there is at least one element, that’s why
the extractMin function needn’t other pattern-matching.
One may think the second clause of min' function should be written like below:
min' ys m (x:xs) = if m < x then min' ys ++ [x] m xs
else min' ys ++ [m] x xs

Or it will produce the updated list in reverse order. Actually, it’s necessary to use
‘cons’ instead of appending here. This is because appending is linear operation which is
proportion to the length of part A, while ‘cons’ is constant O(1) time operation. In fact,
we needn’t keep the relative order of the list to be sorted, as it will be re-arranged anyway
during sorting.
It’s quite possible to keep the relative order during sorting1 , while ensure the perfor-
mance of finding the minimum element not degrade to quadratic. The following equation
defines a solution.

 (l1 , ϕ) : |L| = 1
extractM in(L) = (l1 , L′ ) : l1 < m, (m, L′′ ) = extractM in(L′ ) (9.4)

(m, l1 ∪ L′′ ) : otherwise

If L is a singleton, the minimum is the only element it contains. Otherwise, denote

l1 as the first element in L, and L′ contains the rest elements except for l1 , that L′ =
{l2 , l3 , ...}. The algorithm recursively finding the minimum element in L′ , which yields
the intermediate result as (m, L′′ ), that m is the minimum element in L′ , and L′′ contains
all rest elements except for m. Comparing l1 with m, we can determine which of them is
the final minimum result.
The following Haskell program implements this version of selection sort.
1 known as stable sort.
186CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

sort [] = []
sort xs = x : sort xs' where
(x, xs') = extractMin xs

extractMin [x] = (x, [])

extractMin (x:xs) = if x < m then (x, xs) else (m, x:xs') where
(m, xs') = extractMin xs

Note that only ‘cons’ operation is used, we needn’t appending at all because the
algorithm actually examines the list from right to left. However, it’s not free, as this
program need book-keeping the context (via call stack typically). The relative order is
ensured by the nature of recursion. Please refer to the appendix about tail recursion call
for detailed discussion.

9.2.3 performance of the basic selection sorting

Both the labeling method, and the grouping method need examine all the elements to
pick the minimum in every round; and we totally pick up the minimum element n times.
Thus the performance is around n + (n − 1) + (n − 2) + ... + 1 times comparison, which
is n(n+1)
2 . Selection sort is a quadratic algorithm bound to O(n2 ) time.
Compare to the insertion sort, which we introduced previously, selection sort performs
same in its best case, worst case and average case. While insertion sort performs well in
best case (that the list has been reverse ordered, and it is stored in linked-list) as O(n),
and the worst performance is O(n2 ).
In the next sections, we’ll examine, why selection sort performs poor, and try to
improve it step by step.

Exercise 9.1
• Implement the basic imperative selection sort algorithm (the none in-place version)
in your favorite programming language. Compare it with the in-place version, and
analyze the time and space effectiveness.

9.3 Minor Improvement

9.3.1 Parameterize the comparator
Before any improvement in terms of performance, let’s make the selection sort algorithm
general enough to handle different sorting criteria.
We’ve seen two opposite examples so far, that one may need sort the elements in
ascending order or descending order. For the former case, we need repeatedly finding
the minimum, while for the later, we need find the maximum instead. They are just two
special cases. In real world practice, one may want to sort things in varies criteria, e.g.
in terms of size, weight, age, ...
One solution to handle them all is to passing the criteria as a compare function to the
basic selection sort algorithms. For example:
{
ϕ : L=ϕ
sort(c, L) = (9.5)
m ∪ sort(c, L′′ ) : otherwise, (m, L′′ ) = extract(c, L′ )
And the algorithm extract(c, L) is defined as below.

 (l1 , ϕ) : |L| = 1
extract(c, L) = (l1 , L′ ) : c(l1 , m), (m, L′′ ) = extract(c, L′ ) (9.6)

(m, {l1 } ∪ L′′ ) : ¬c(l1 , m)
9.3. MINOR IMPROVEMENT 187

Where c is a comparator function, it takes two elements, compare them and returns
the result of which one is preceding of the other. Passing ‘less than’ operator (<) turns
this algorithm to be the version we introduced in previous section.
Some environments require to pass the total ordering comparator, which returns result
among ‘less than’, ’equal’, and ’greater than’. We needn’t such strong condition here,
that c only tests if ‘less than’ is satisfied. However, as the minimum requirement, the
comparator should meet the strict weak ordering as following [52]:

• Irreflexivity, for all x, it’s not the case that x < x;

• Asymmetric, For all x and y, if x < y, then it’s not the case y < x;

• Transitivity, For all x, y, and z, if x < y, and y < z, then x < z;

The following Scheme/Lisp program translates this generic selection sorting algorithm.
The reason why we choose Scheme/Lisp here is because the lexical scope can simplify the
needs to pass the ‘less than’ comparator for every function calls.
(define (sel-sort-by ltp? lst)
(define (ssort lst)
(if (null? lst)
lst
(let ((p (extract-min lst)))
(cons (car p) (ssort (cdr p))))))
(define (extract-min lst)
(if (null? (cdr lst))
lst
(let ((p (extract-min (cdr lst))))
(if (ltp? (car lst) (car p))
lst
(cons (car p) (cons (car lst) (cdr p)))))))
(ssort lst))

Note that, both ssort and extract-min are inner functions, so that the ‘less than’
comparator ltp? is available to them. Passing ‘<’ to this function yields the normal
sorting in ascending order:
(sel-sort-by < '(3 1 2 4 5 10 9))
; Value 16: (1 2 3 4 5 9 10)

It’s possible to pass varies of comparator to imperative selection sort as well. This is
left as an exercise to the reader.
For the sake of brevity, we only consider sorting elements in ascending order in the
rest of this chapter. And we’ll not pass comparator as a parameter unless it’s necessary.

9.3.2 Trivial fine tune

The basic in-place imperative selection sorting algorithm iterates all elements, and picking
the minimum by traversing as well. It can be written in a compact way, that we inline
the minimum finding part as an inner loop.
procedure Sort(A)
for i ← 1 to |A| do
m←i
for j ← i + 1 to |A| do
if A[i] < A[m] then
m←i
Exchange A[i] ↔ A[m]
188CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

Observe that, when we are sorting n elements, after the first n − 1 minimum ones are
selected, the left only one, is definitely the n-th big element, so that we need NOT find
the minimum if there is only one element in the list. This indicates that the outer loop
can iterate to n − 1 instead of n.
Another place we can fine tune, is that we needn’t swap the elements if the i-th
minimum one is just A[i]. The algorithm can be modified accordingly as below:
procedure Sort(A)
for i ← 1 to |A| − 1 do
m←i
for j ← i + 1 to |A| do
if A[i] < A[m] then
m←i
if m 6= i then
Exchange A[i] ↔ A[m]
Definitely, these modifications won’t affects the performance in terms of big-O.

9.3.3 Cock-tail sort

Knuth gave an alternative realization of selection sort in [51]. Instead of selecting the
minimum each time, we can select the maximum element, and put it to the last position.
This method can be illustrated by the following algorithm.
procedure Sort’(A)
for i ← |A| down-to 2 do
m←i
for j ← 1 to i − 1 do
if A[m] < A[i] then
m←i
Exchange A[i] ↔ A[m]
As shown in figure 13.1, at any time, the elements on right most side are sorted. The
algorithm scans all unsorted ones, and locate the maximum. Then, put it to the tail of
the unsorted range by swapping.

swap

... max ... x ... sorted elements ...

Figure 9.4: Select the maximum every time and put it to the end.

This version reveals the fact that, selecting the maximum element can sort the element
in ascending order as well. What’s more, we can find both the minimum and the maximum
elements in one pass of traversing, putting the minimum at the first location, while putting
the maximum at the last position. This approach can speed up the sorting slightly (halve
the times of the outer loop). This method is called ’cock-tail sort’.
procedure Sort(A)
for i ← 1 to b |A|
2 c do
min ← i
max ← |A| + 1 − i
if A[max] < A[min] then
Exchange A[min] ↔ A[max]
9.3. MINOR IMPROVEMENT 189

for j ← i + 1 to |A| − i do
if A[j] < A[min] then
min ← j
if A[max] < A[j] then
max ← j
Exchange A[i] ↔ A[min]
Exchange A[|A| + 1 − i] ↔ A[max]
This algorithm can be illustrated as in figure 9.5, at any time, the left most and right
most parts contain sorted elements so far. That the smaller sorted ones are on the left,
while the bigger sorted ones are on the right. The algorithm scans the unsorted ranges,
located both the minimum and the maximum positions, then put them to the head and
the tail position of the unsorted ranges by swapping.

swap

... sorted small ones ... x ... max ... min ... y ... sorted big ones ...

Figure 9.5: Select both the minimum and maximum in one pass, and put them to the
proper positions.

Note that it’s necessary to swap the left most and right most elements before the
inner loop if they are not in correct order. This is because we scan the range excluding
these two elements. Another method is to initialize the first element of the unsorted
range as both the maximum and minimum before the inner loop. However, since we
need two swapping operations after the scan, it’s possible that the first swapping moves
the maximum or the minimum from the position we just found, which leads the second
swapping malfunctioned. How to solve this problem is left as exercise to the reader.
The following Python example program implements this cock-tail sort algorithm.
def cocktail_sort(xs):
n = len(xs)
for i in range(n / 2):
(mi, ma) = (i, n - 1 -i)
if xs[ma] < xs[mi]:
(xs[mi], xs[ma]) = (xs[ma], xs[mi])
for j in range(i+1, n - 1 - i):
if xs[j] < xs[mi]:
mi = j
if xs[ma] < xs[j]:
ma = j
(xs[i], xs[mi]) = (xs[mi], xs[i])
(xs[n - 1 - i], xs[ma]) = (xs[ma], xs[n - 1 - i])
return xs

It’s possible to realize cock-tail sort in functional approach as well. An intuitive

recursive description can be given like this:

• Trivial edge case: If the list is empty, or there is only one element in the list, the
sorted result is obviously the origin list;

• Otherwise, we select the minimum and the maximum, put them in the head and
tail positions, then recursively sort the rest elements.
190CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

This algorithm description can be formalized by the following equation.

{
L : |L| ≤ 1
sort(L) = (9.7)
{lmin } ∪ sort(L′′ ) ∪ {lmax } : otherwise

Where the minimum and the maximum are extracted from L by a function select(L).

(lmin , L′′ , lmax ) = select(L)

Note that, the minimum is actually linked to the front of the recursive sort result. Its
semantic is a constant O(1) time ‘cons’ (refer to the appendix of this book for detail).
While the maximum is appending to the tail. This is typically a linear O(n) time expensive
operation. We’ll optimize it later.
Function select(L) scans the whole list to find both the minimum and the maximum.
It can be defined as below:


 (min(l1 , l2 ), max(l1 , l2 )) : L = {l1 , l2 }

(l1 , {lmin } ∪ L′′ , lmax ) : l1 < lmin
select(L) = (9.8)

 (lmin , {lmax } ∪ L′′ , l1 ) : lmax < l1
 ′′
(lmin , {l1 } ∪ L , lmax ) : otherwise

Where (lmin , L′′ , lmax ) = select(L′ ) and L′ is the rest of the list except for the first
element l1 . If there are only two elements in the list, we pick the smaller as the minimum,
and the bigger as the maximum. After extract them, the list becomes empty. This is the
trivial edge case; Otherwise, we take the first element l1 out, then recursively perform
selection on the rest of the list. After that, we compare if l1 is less then the minimum or
greater than the maximum candidates, so that we can finalize the result.
Note that for all the cases, there is no appending operation to form the result. However,
since selection must scan all the element to determine the minimum and the maximum,
it is bound to O(n) linear time.
The complete example Haskell program is given as the following.
csort [] = []
csort [x] = [x]
csort xs = mi : csort xs' ++ [ma] where
(mi, xs', ma) = extractMinMax xs

extractMinMax [x, y] = (min x y, [], max x y)

extractMinMax (x:xs) | x < mi = (x, mi:xs', ma)
| ma < x = (mi, ma:xs', x)
| otherwise = (mi, x:xs', ma)
where (mi, xs', ma) = extractMinMax xs

We mentioned that the appending operation is expensive in this intuitive version. It

can be improved. This can be achieved in two steps. The first step is to convert the
cock-tail sort into tail-recursive call. Denote the sorted small ones as A, and sorted big
ones as B in figure 9.5. We use A and B as accumulators. The new cock-tail sort is
defined as the following.
{
′ A ∪ L ∪ B : L = ϕ ∨ |L| = 1
sort (A, L, B) = (9.9)
sort′ (A ∪ {lmin }, L′′ , {lmax } ∪ B) : otherwise

Where lmin , lmax and L′′ are defined as same as before. And we start sorting by
passing empty A and B: sort(L) = sort′ (ϕ, L, ϕ).
Besides the edge case, observing that the appending operation only happens on A ∪
{lmin }; while lmax is only linked to the head of B. This appending occurs in every
←
−
recursive call. To eliminate it, we can store A in reverse order as A , so that lmax can be
9.4. MAJOR IMPROVEMENT 191

‘cons’ to the head instead of appending. Denote cons(x, L) = {x} ∪ L and append(L, x) =
L ∪ {x}, we have the below equation.

append(L, x) = reverse(cons(x, reverse(L)))

←
− (9.10)
= reverse(cons(x, L ))
←−
Finally, we perform a reverse to turn A back to A. Based on this idea, the algorithm
can be improved one more step as the following.

 reverse(A) ∪ B : L=ϕ
sort′ (A, L, B) = reverse({l1 } ∪ A) ∪ B : |L| = 1 (9.11)

sort′ ({lmin } ∪ A, L′′ , {lmax } ∪ B) :

This algorithm can be implemented by Haskell as below.

csort' xs = cocktail [] xs [] where
cocktail as [] bs = reverse as ++ bs
cocktail as [x] bs = reverse (x:as) ++ bs
cocktail as xs bs = let (mi, xs', ma) = extractMinMax xs
in cocktail (mi:as) xs' (ma:bs)

Exercise 9.2

• Realize the imperative basic selection sort algorithm, which can take a comparator
as a parameter. Please try both dynamic typed language and static typed language.
How to annotate the type of the comparator as general as possible in a static typed
language?
• Implement Knuth’s version of selection sort in your favorite programming language.
• An alternative to realize cock-tail sort is to assume the i-th element both the min-
imum and the maximum, after the inner loop, the minimum and maximum are
found, then we can swap the the minimum to the i-th position, and the maximum
to position |A| + 1 − i. Implement this solution in your favorite imperative language.
Please note that there are several special edge cases should be handled correctly:
– A = {max, min, ...};
– A = {..., max, min};
– A = {max, ..., min}.
Please don’t refer to the example source code along with this chapter before you
try to solve this problem.
• Realize the function select(L) by folding.

9.4 Major improvement

Although cock-tail sort halves the numbers of loop, the performance is still bound to
quadratic time. It means that, the method we developed so far handles big data poorly
compare to other divide and conquer sorting solutions.
To improve selection based sort essentially, we must analyze where is the bottle-
neck. In order to sort the elements by comparison, we must examine all the elements for
ordering. Thus the outer loop of selection sort is necessary. However, must it scan all the
192CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

elements every time to select the minimum? Note that when we pick the smallest one at
the first time, we actually traverse the whole collection, so that we know which ones are
relative big, and which ones are relative small partially.
The problem is that, when we select the further minimum elements, instead of re-using
the ordering information we obtained previously, we drop them all, and blindly start a
new traverse.
So the key point to improve selection based sort is to re-use the previous result. There
are several approaches, we’ll adopt an intuitive idea inspired by football match in this
chapter.

9.4.1 Tournament knock out

The football world cup is held every four years. There are 32 teams from different conti-
nent play the final games. Before 1982, there were 16 teams compete for the tournament
finals[53].
For simplification purpose, let’s go back to 1978 and imagine a way to determine the
champion: In the first round, the teams are grouped into 8 pairs to play the game; After
that, there will be 8 winner, and 8 teams will be out. Then in the second round, these 8
teams are grouped into 4 pairs. This time there will be 4 winners after the second round
of games; Then the top 4 teams are divided into 2 pairs, so that there will be only two
teams left for the final game.
The champion is determined after the total 4 rounds of games. And there are actually
8 + 4 + 2 + 1 = 16 games. Now we have the world cup champion, however, the world cup
game won’t finish at this stage, we need to determine which is the silver medal team.
Readers may argue that isn’t the team beaten by the champion at the final game the
second best? This is true according to the real world cup rule. However, it isn’t fair
enough in some sense.
We often heard about the so called ‘group of death’, Let’s suppose that Brazil team is
grouped with Deutch team at the very beginning. Although both teams are quite strong,
one of them must be knocked out. It’s quite possible that even the team loss that game
can beat all the other teams except for the champion. Figure 9.6 illustrates such case.

16 14

16 13 10 14

7 16 8 13 10 9 12 14

7 6 15 16 8 4 13 3 5 10 9 1 12 2 11 14

Figure 9.6: The element 15 is knocked out in the first round.

Imagine that every team has a number. The bigger the number, the stronger the team.
Suppose that the stronger team always beats the team with smaller number, although
this is not true in real world. But this simplification is fair enough for us to develop the
tournament knock out solution. This maximum number which represents the champion
is 16. Definitely, team with number 14 isn’t the second best according to our rules. It
should be 15, which is knocked out at the first round of comparison.
The key question here is to find an effective way to locate the second maximum number
in this tournament tree. After that, what we need is to apply the same method to select
the third, the fourth, ..., to accomplish the selection based sort.
9.4. MAJOR IMPROVEMENT 193

One idea is to assign the champion a very small number (for instance, −∞), so that
it won’t be selected next time, and the second best one, becomes the new champion.
However, suppose there are 2m teams for some natural number m, it still takes 2m−1 +
2m−2 + ... + 2 + 1 = 2m times of comparison to determine the new champion. Which is
as slow as the first time.
Actually, we needn’t perform a bottom-up comparison at all since the tournament
tree stores plenty of ordering information. Observe that, the second best team must be
beaten by the champion at sometime, or it will be the final winner. So we can track the
path from the root of the tournament tree to the leaf of the champion, examine all the
teams along with this path to find the second best team.
In figure 9.6, this path is marked in gray color, the elements to be examined are
{14, 13, 7, 15}. Based on this idea, we refine the algorithm like below.

1. Build a tournament tree from the elements to be sorted, so that the champion (the
maximum) becomes the root;

2. Extract the root from the tree, perform a top-down pass and replace the maximum
with −∞;

3. Perform a bottom-up back-track along the path, determine the new champion and
make it as the new root;

4. Repeat step 2 until all elements have been extracted.

Figure 9.7, 9.8, and 9.9 show the steps of applying this strategy.

15 14

15 13 10 14

7 15 8 13 10 9 12 14

7 6 15 -INF 8 4 13 3 5 10 9 1 12 2 11 14

Figure 9.7: Extract 16, replace it with −∞, 15 sifts up to root.

13 14

7 13 10 14

7 -INF 8 13 10 9 12 14

7 6 -INF -INF 8 4 13 3 5 10 9 1 12 2 11 14

Figure 9.8: Extract 15, replace it with −∞, 14 sifts up to root.

We can reuse the binary tree definition given in the first chapter of this book to
represent tournament tree. In order to back-track from leaf to the root, every node
should hold a reference to its parent (concept of pointer in some environment such as
ANSI C):
194CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

13 12

7 13 10 12

7 -INF 8 13 10 9 12 11

7 6 -INF -INF 8 4 13 3 5 10 9 1 12 2 11 -INF

Figure 9.9: Extract 14, replace it with −∞, 13 sifts up to root.

struct Node {
Key key;
struct Node ∗left, ∗right, ∗parent;
};

To build a tournament tree from a list of elements (suppose the number of elements
are 2m for some m), we can first wrap each element as a leaf, so that we obtain a list of
binary trees. We take every two trees from this list, compare their keys, and form a new
binary tree with the bigger key as the root; the two trees are set as the left and right
children of this new binary tree. Repeat this operation to build a new list of trees. The
height of each tree is increased by 1. Note that the size of the tree list halves after such a
pass, so that we can keep reducing the list until there is only one tree left. And this tree
is the finally built tournament tree.
function Build-Tree(A)
T ←ϕ
for each x ∈ A do
t ← Create-Node
Key(t) ← x
Append(T, t)
while |T | > 1 do
T′ ← ϕ
for every t1 , t2 ∈ T do
t ← Create-Node
Key(t) ← Max(Key(t1 ), Key(t2 ))
Left(t) ← t1
Right(t) ← t2
Parent(t1 ) ← t
Parent(t2 ) ← t
Append(T ′ , t)
T ← T′
return T [1]
Suppose the length of the list A is n, this algorithm firstly traverses the list to build
tree, which is linear to n time. Then it repeatedly compares pairs, which loops proportion
to n + n2 + n4 + ... + 2 = 2n. So the total performance is bound to O(n) time.
The following ANSI C program implements this tournament tree building algorithm.
struct Node∗ build(const Key∗ xs, int n) {
int i;
struct Node ∗t, ∗∗ts = (struct Node∗∗) malloc(sizeof(struct Node∗) ∗ n);
for (i = 0; i < n; ++i)
ts[i] = leaf(xs[i]);
9.4. MAJOR IMPROVEMENT 195

for (; n > 1; n /= 2)
for (i = 0; i < n; i += 2)
ts[i/2] = branch(max(ts[i]→key, ts[i+1]→key), ts[i], ts[i+1]);
t = ts[0];
free(ts);
return t;
}

The type of key can be defined somewhere, for example:

typedef int Key;

Function leaf(x) creats a leaf node, with value x as key, and sets all its fields,
left, right and parent to NIL. While function branch(key, left, right) creates a
branch node, and links the new created node as parent of its two children if they are not
empty. For the sake of brevity, we skip the detail of them. They are left as exercise to
the reader, and the complete program can be downloaded along with this book.
Some programming environments, such as Python provides tool to iterate every two
elements at a time, for example:
for x, y in zip(∗[iter(ts)]∗2):

We skip such language specific feature, readers can refer to the Python example pro-
gram along with this book for details.
When the maximum element is extracted from the tournament tree, we replace it with
−∞, and repeatedly replace all these values from the root to the leaf. Next, we back-track
to root through the parent field, and determine the new maximum element.
function Extract-Max(T )
m ← Key(T )
Key(T ) ← −∞
while ¬ Leaf?(T ) do ▷ The top down pass
if Key(Left(T )) = m then
T ← Left(T )
else
T ← Right(T )
Key(T ) ← −∞
while Parent(T ) 6= ϕ do ▷ The bottom up pass
T ← Parent(T )
Key(T ) ← Max(Key(Left(T )), Key(Right(T )))
return m
This algorithm returns the extracted maximum element, and modifies the tournament
tree in-place. Because we can’t represent −∞ in real program by limited length of word,
one approach is to define a relative negative big number, which is less than all the elements
in the tournament tree, for example, suppose all the elements are greater than -65535, we
can define negative infinity as below:
#define N_INF -65535

We can implements this algorithm as the following ANSI C example program.

Key pop(struct Node∗ t) {
Key x = t→key;
t→key = N_INF;
while (!isleaf(t)) {
t = t→left→key == x ? t→left : t→right;
t→key = N_INF;
}
196CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

while (t→parent) {
t = t→parent;
t→key = max(t→left→key, t→right→key);
}
return x;
}

The behavior of Extract-Max is quite similar to the pop operation for some data
structures, such as queue, and heap, thus we name it as pop in this code snippet.
Algorithm Extract-Max process the tree in two passes, one is top-down, then a
bottom-up along the path that the ‘champion team wins the world cup’. Because the
tournament tree is well balanced, the length of this path, which is the height of the tree,
is bound to O(lg n), where n is the number of the elements to be sorted (which are equal
to the number of leaves). Thus the performance of this algorithm is O(lg n).
It’s possible to realize the tournament knock out sort now. We build a tournament
tree from the elements to be sorted, then continuously extract the maximum. If we want
to sort in monotonically increase order, we put the first extracted one to the right most,
then insert the further extracted elements one by one to left; Otherwise if we want to sort
in decrease order, we can just append the extracted elements to the result. Below is the
algorithm sorts elements in ascending order.
procedure Sort(A)
T ← Build-Tree(A)
for i ← |A| down to 1 do
A[i] ← Extract-Max(T )
Translating it to ANSI C example program is straightforward.
void tsort(Key∗ xs, int n) {
struct Node∗ t = build(xs, n);
while(n)
xs[--n] = pop(t);
release(t);
}

This algorithm firstly takes O(n) time to build the tournament tree, then performs n
pops to select the maximum elements so far left in the tree. Since each pop operation is
bound to O(lg n), thus the total performance of tournament knock out sorting is O(n lg n).

Refine the tournament knock out

It’s possible to design the tournament knock out algorithm in purely functional approach.
And we’ll see that the two passes (first top-down replace the champion with −∞, then
bottom-up determine the new champion) in pop operation can be combined in recursive
manner, so that we needn’t the parent field any more. We can re-use the functional binary
tree definition as the following example Haskell code.
data Tr a = Empty | Br (Tr a) a (Tr a)

Thus a binary tree is either empty or a branch node contains a key, a left sub tree and
a right sub tree. Both children are again binary trees.
We’ve use hard coded big negative number to represents −∞. However, this solution
is ad-hoc, and it forces all elements to be sorted are greater than this pre-defined magic
number. Some programming environments support algebraic type, so that we can define
negative infinity explicitly. For instance, the below Haskell program setups the concept
of infinity 2 .
2 The order of the definition of ‘NegInf’, regular number, and ‘Inf’ is significant if we want to derive
9.4. MAJOR IMPROVEMENT 197

data Infinite a = NegInf | Only a | Inf deriving (Eq, Ord)

From now on, we switch back to use the min() function to determine the winner, so
that the tournament selects the minimum instead of the maximum as the champion.
Denote function key(T ) returns the key of the tree rooted at T . Function wrap(x)
wraps the element x into a leaf node. Function tree(l, k, r) creates a branch node, with k
as the key, l and r as the two children respectively.
The knock out process, can be represented as comparing two trees, picking the smaller
key as the new key, and setting these two trees as children:
branch(T1 , T2 ) = tree(T1 , min(key(T1 ), key(T2 )), T2 ) (9.12)
This can be implemented in Haskell word by word:
branch t1 t2 = Br t1 (min (key t1) (key t2)) t2

There is limitation in our tournament sorting algorithm so far. It only accepts collec-
tion of elements with size of 2m , or we can’t build a complete binary tree. This can be
actually solved in the tree building process. Remind that we pick two trees every time,
compare and pick the winner. This is perfect if there are always even number of trees.
Considering a case in football match, that one team is absent for some reason (sever flight
delay or whatever), so that there left one team without a challenger. One option is to
make this team the winner, so that it will attend the further games. Actually, we can use
the similar approach.
To build the tournament tree from a list of elements, we wrap every element into a
leaf, then start the building process.
build(L) = build′ ({wrap(x)|x ∈ L}) (9.13)
The build′ (T) function terminates when there is only one tree left in T, which is the
champion. This is the trivial edge case. Otherwise, it groups every two trees in a pair to
determine the winners. When there are odd numbers of trees, it just makes the last tree
as the winner to attend the next level of tournament and recursively repeats the building
process.
{
T : |T| ≤ 1
build′ (T) = (9.14)
build′ (pair(T)) : otherwise
Note that this algorithm actually handles another special cases, that the list to be
sort is empty. The result is obviously empty.
Denote T = {T1 , T2 , ...} if there are at least two trees, and T′ represents the left trees
by removing the first two. Function pair(T) is defined as the following.
{
{branch(T1 , T2 )} ∪ pair(T′ ) : |T| ≥ 2
pair(T) = (9.15)
T : otherwise
The complete tournament tree building algorithm can be implemented as the below
example Haskell program.
fromList :: (Ord a) ⇒ [a] → Tr (Infinite a)
fromList = build ◦ (map wrap) where
build [] = Empty
build [t] = t
build ts = build $ pair ts
pair (t1:t2:ts) = (branch t1 t2):pair ts
pair ts = ts

the default, correct comparing behavior of ‘Ord’. Anyway, it’s possible to specify the detailed order by
make it as an instance of ‘Ord’. However, this is Language specific feature which is out of the scope of
this book. Please refer to other textbook about Haskell.
198CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR

When extracting the champion (the minimum) from the tournament tree, we need
examine either the left child sub-tree or the right one has the same key as the root, and
recursively extract on that tree until arrive at the leaf node. Denote the left sub-tree of
T as L, right sub-tree as R, and K as its key. We can define this popping algorithm as
the following.

 tree(ϕ, ∞, ϕ) : L = ϕ ∧ R = ϕ
pop(T ) = tree(L′ , min(key(L′ ), key(R)), R) : K = key(L), L′ = pop(L) (9.16)

tree(L, min(key(L), key(R′ )), R′ ) : K = key(R), R′ = pop(R)
It’s straightforward to translate this algorithm into example Haskell code.
pop (Br Empty _ Empty) = Br Empty Inf Empty
pop (Br l k r) | k == key l = let l' = pop l in Br l' (min (key l') (key r)) r
| k == key r = let r' = pop r in Br l (min (key l) (key r')) r'

Note that this algorithm only removes the current champion without returning it. So
it’s necessary to define a function to get the champion at the root node.
top(T ) = key(T ) (9.17)
With these functions defined, tournament knock out sorting can be formalized by
using them.
sort(L) = sort′ (build(L)) (9.18)
Where sort′ (T ) continuously pops the minimum element to form a result tree
{
ϕ : T = ϕ ∨ key(T ) = ∞
sort′ (T ) = (9.19)
{top(T )} ∪ sort′ (pop(T )) : otherwise
The rest of the Haskell code is given below to complete the implementation.
top = only ◦ key

tsort :: (Ord a) ⇒ [a] → [a]

tsort = sort' ◦ fromList where
sort' Empty = []
sort' (Br _ Inf _) = []
sort' t = (top t) : (sort' $ pop t)

And the auxiliary function only, key, wrap accomplished with explicit infinity sup-
port are list as the following.
only (Only x) = x
key (Br _ k _ ) = k
wrap x = Br Empty (Only x) Empty

Exercise 9.3

• Implement the helper function leaf(), branch, max() lsleaf(), and release()
to complete the imperative tournament tree program.
• Implement the imperative tournament tree in a programming language support GC
(garbage collection).
• Why can our tournament tree knock out sort algorithm handle duplicated elements
(elements with same value)? We say a sorting algorithm stable, if it keeps the
original order of elements with same value. Is the tournament tree knock out sorting
stable?
9.5. SHORT SUMMARY 199

• Design an imperative tournament tree knock out sort algorithm, which satisfies the
following:

– Can handle arbitrary number of elements;

– Without using hard coded negative infinity, so that it can take elements with
any value.
• Compare the tournament tree knock out sort algorithm and binary tree sort algo-
rithm, analyze eﬀiciency both in time and space.
• Compare the heap sort algorithm and binary tree sort algorithm, and do same
analysis for them.

9.4.2 Final improvement by using heap sort

We manage improving the performance of selection based sorting to O(n lg n) by using
tournament knock out. This is the limit of comparison based sort according to [51].
However, there are still rooms for improvement. After sorting, there lefts a complete
binary tree with all leaves and branches hold useless infinite values. This isn’t space
eﬀicient at all. Can we release the nodes when popping?
Another observation is that if there are n elements to be sorted, we actually allocate
about 2n tree nodes. n for leaves and n for branches. Is there any better way to halve
the space usage?
The final sorting structure described in equation 9.19 can be easily uniformed to a
more general one if we treat the case that the tree is empty if its root holds infinity as
key:
{
′ ϕ : T =ϕ
sort (T ) = (9.20)
{top(T )} ∪ sort′ (pop(T )) : otherwise

This is exactly as same as the one of heap sort we gave in previous chapter. Heap
always keeps the minimum (or the maximum) on the top, and provides fast pop operation.
The binary heap by implicit array encodes the tree structure in array index, so there aren’t
any extra spaces allocated except for the n array cells. The functional heaps, such as leftist
heap and splay heap allocate n nodes as well. We’ll introduce more heaps in next chapter
which perform well in many aspects.

9.5 Short summary

In this chapter, we present the evolution process of selection based sort. selection sort is
easy and commonly used as example to teach students about embedded looping. It has
simple and straightforward structure, but the performance is quadratic. In this chapter,
we do see that there exists ways to improve it not only by some fine tuning, but also
fundamentally change the data structure, which leads to tournament knock out and heap
sort.
200CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SOR
Bibliography

[3] Wikipedia. “Strict weak order”. [Link]

[4] Wikipedia. “FIFA world cup”. [Link]

201
202 Binomial heap, Fibonacci heap, and pairing heap
Chapter 10

Binomial heap, Fibonacci heap,

and pairing heap

10.1 Introduction
In previous chapter, we mentioned that heaps can be generalized and implemented with
varies of data structures. However, only binary heaps are focused so far no matter by
explicit binary trees or implicit array.
It’s quite natural to extend the binary tree to K-ary [54] tree. In this chapter, we first
show Binomial heaps which is actually consist of forest of K-ary trees. Binomial heaps
gain the performance for all operations to O(lg n), as well as keeping the finding minimum
element to O(1) time.
If we delay some operations in Binomial heaps by using lazy strategy, it turns to be
Fibonacci heap.
All binary heaps we have shown perform no less than O(lg n) time for merging, we’ll
show it’s possible to improve it to O(1) with Fibonacci heap, which is quite helpful
to graph algorithms. Actually, Fibonacci heap achieves almost all operations to good
amortized time bound as O(1), and left the heap pop to O(lg n).
Finally, we’ll introduce about the pairing heaps. It has the best performance in prac-
tice although the proof of it is still a conjecture for the time being.

10.2 Binomial Heaps

10.2.1 Definition
Binomial heap is more complex than most of the binary heaps. However, it has excellent
merge performance which bound to O(lg n) time. A binomial heap is consist of a list of
binomial trees.

Binomial tree
In order to explain why the name of the tree is ‘binomial’, let’s review the famous Pascal’s
triangle (Also know as the Jia Xian’s triangle to memorize the Chinese methematican Jia
Xian (1010-1070).) [55].

1
1 1

203
204 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

1 2 1
1 3 3 1
1 4 6 4 1
...
In each row, the numbers are all binomial coeﬀicients. There are many ways to gain
a series of binomial coeﬀicient numbers. One of them is by using recursive composition.
Binomial trees, as well, can be defined in this way as the following.
• A binomial tree of rank 0 has only a node as the root;
• A binomial tree of rank n is consist of two rank n − 1 binomial trees, Among these
2 sub trees, the one has the bigger root element is linked as the leftmost child of
the other.
We denote a binomial tree of rank 0 as B0 , and the binomial tree of rank n as Bn .
Figure 10.1 shows a B0 tree and how to link 2 Bn−1 trees to a Bn tree.

(a) A B0 tree.

rank=n-1

rank=n-1 ...

...

(b) Linking 2 Bn−1 trees yields a Bn tree.

Figure 10.1: Recursive definition of binomial trees

With this recursive definition, it easy to draw the form of binomial trees of rank 0, 1,
2, ..., as shown in figure 10.2
Observing the binomial trees reveals some interesting properties. For each rank n
binomial tree, if counting the number of nodes in each row, it can be found that it is the
binomial number.
For instance for rank 4 binomial tree, there is 1 node as the root; and in the second
level next to root, there are 4 nodes; and in 3rd level, there are 6 nodes; and in 4-th level,
there are 4 nodes; and the 5-th level, there is 1 node. They are exactly 1, 4, 6, 4, 1 which
is the 5th row in Pascal’s triangle. That’s why we call it binomial tree.
Another interesting property is that the total number of node for a binomial tree with
rank n is 2n . This can be proved either by binomial theory or the recursive definition
directly.

Binomial heap
With binomial tree defined, we can introduce the definition of binomial heap. A binomial
heap is a set of binomial trees (or a forest of binomial trees) that satisfied the following
10.2. BINOMIAL HEAPS 205

2 2 1 0

1 1 0 1 0 0

0 0 0

(a) B0 tree; (b) B1 tree; (c) B2 tree; (d) B3 tree;

3 2 1 0

2 1 0 1 0 0

1 0 0 0

0
...
(e) B4 tree;

Figure 10.2: Forms of binomial trees with rank = 0, 1, 2, 3, 4, ...

206 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

properties.

• Each binomial tree in the heap conforms to heap property, that the key of a node
is equal or greater than the key of its parent. Here the heap is actually min-heap,
for max-heap, it changes to ‘equal or less than’. In this chapter, we only discuss
about min-heap, and max-heap can be equally applied by changing the comparison
condition.

• There is at most one binomial tree which has the rank r. In other words, there are
no two binomial trees have the same rank.

This definition leads to an important result that for a binomial heap contains n ele-
ments, and if convert n to binary format yields a0 , a1 , a2 , ..., am , where a0 is the LSB and
am is the MSB, then for each 0 ≤ i ≤ m, if ai = 0, there is no binomial tree of rank i and
if ai = 1, there must be a binomial tree of rank i.
For example, if a binomial heap contains 5 element, as 5 is ‘(LSB)101(MSB)’, then
there are 2 binomial trees in this heap, one tree has rank 0, the other has rank 2.
Figure 10.3 shows a binomial heap which have 19 nodes, as 19 is ‘(LSB)11001(MSB)’
in binary format, so there is a B0 tree, a B1 tree and a B4 tree.

18 3 6

37 8 29 10 44

30 23 22 48 31 17

45 32 24 50

Figure 10.3: A binomial heap with 19 elements

Data layout
There are two ways to define K-ary trees imperatively. One is by using ‘left-child, right-
sibling’ approach[4]. It is compatible with the typical binary tree structure. For each
node, it has two fields, left field and right field. We use the left field to point to the first
child of this node, and use the right field to point to the sibling node of this node. All
siblings are represented as a single directional linked list. Figure 10.4 shows an example
tree represented in this way.
The other way is to use the library defined collection container, such as array or list
to represent all children of a node.
Since the rank of a tree plays very important role, we also defined it as a field.
For ‘left-child, right-sibling’ method, we defined the binomial tree as the following.1
1C programs are also provided along with this book.
10.2. BINOMIAL HEAPS 207

R NIL

C1 C2 ... Cn

C1’ C2’ ... Cm’

Figure 10.4: Example tree represented in ‘left-child, right-sibling’ way. R is the root node,
it has no sibling, so it right side is pointed to N IL. C1 , C2 , ..., Cn are children of R. C1
is linked from the left side of R, other siblings of C1 are linked one next to each other on
the right side of C1 . C2′ , ..., Cm
′
are children of C1 .

class BinomialTree:
def __init__(self, x = None):
[Link] = 0
[Link] = x
[Link] = None
[Link] = None
[Link] = None

When initialize a tree with a key, we create a leaf node, set its rank as zero and all
other fields are set as NIL.
It quite nature to utilize pre-defined list to represent multiple children as below.
class BinomialTree:
def __init__(self, x = None):
[Link] = 0
[Link] = x
[Link] = None
[Link] = []

For purely functional settings, such as in Haskell language, binomial tree are defined
as the following.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}

While binomial heap are defined as a list of binomial trees (a forest) with ranks
in monotonically increase order. And as another implicit constraint, there are no two
binomial trees have the same rank.
type BiHeap a = [BiTree a]

10.2.2 Basic heap operations

Linking trees
Before dive into the basic heap operations such as pop and insert, We’ll first realize
how to link two binomial trees with same rank into a bigger one. According to the
definition of binomial tree and heap property that the root always contains the minimum
key, we firstly compare the two root values, select the smaller one as the new root, and
insert the other tree as the first child in front of all other children. Suppose function
208 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

Key(T ), Children(T ), and Rank(T ) access the key, children and rank of a binomial tree
respectively.
{
node(r + 1, x, {T2 } ∪ C1 ) : x < y
link(T1 , T2 ) = (10.1)
node(r + 1, y, {T1 } ∪ C2 ) : otherwise

Where

x = Key(T1 )
y = Key(T2 )
r = Rank(T1 ) = Rank(T2 )
C1 = Children(T1 )
C2 = Children(T2 )

y ...

...

Figure 10.5: Suppose x < y, insert y as the first child of x.

Note that the link operation is bound to O(1) time if the ∪ is a constant time operation.
It’s easy to translate the link function to Haskell program as the following.
link t1@(Node r x c1) t2@(Node _ y c2) =
if x<y then Node (r+1) x (t2:c1)
else Node (r+1) y (t1:c2)

It’s possible to realize the link operation in imperative way. If we use ‘left child, right
sibling’ approach, we just link the tree which has the bigger key to the left side of the
other’s key, and link the children of it to the right side as sibling. Figure 10.6 shows the
result of one case.
1: function Link(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Sibling(T2 ) ← Child(T1 )
5: Child(T1 ) ← T2
6: Parent(T2 ) ← T1
7: Rank(T1 ) ← Rank(T1 ) + 1
8: return T1
And if we use a container to manage all children of a node, the algorithm is like below.
1: function Link’(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Parent(T2 ) ← T1
5: Insert-Before(Children(T1 ), T2 )
6: Rank(T1 ) ← Rank(T1 ) + 1
7: return T1
10.2. BINOMIAL HEAPS 209

y ...

...

Figure 10.6: Suppose x < y, link y to the left side of x and link the original children of x
to the right side of y.

It’s easy to translate both algorithms to real program. Here we only show the Python
program of Link’ for illustration purpose 2 .
def link(t1, t2):
if [Link] < [Link]:
(t1, t2) = (t2, t1)
[Link] = t1
[Link](0, t2)
[Link] = [Link] + 1
return t1

Exercise 10.1
Implement the tree-linking program in your favorite language with left-child, right-
sibling method.

We mentioned linking is a constant time algorithm and it is true when using left-child,
right-sibling approach, However, if we use container to manage the children, the perfor-
mance depends on the concrete implementation of the container. If it is plain array, the
linking time will be proportion to the number of children. In this chapter, we assume the
time is constant. This is true if the container is implemented in linked-list.

Insert a new element to the heap (push)

As the rank of binomial trees in a forest is monotonically increasing, by using the link
function defined above, it’s possible to define an auxiliary function, so that we can insert
a new tree, with rank no bigger than the smallest one, to the heap which is a forest
actually.
Denote the non-empty heap as H = {T1 , T2 , ..., Tn }, we define

 {T } : H = ϕ
insertT (H, T ) = {T } ∪ H : Rank(T ) < Rank(T1 ) (10.2)

insertT (H ′ , link(T, T1 )) : otherwise

where

H ′ = {T2 , T3 , ..., Tn }

The idea is that for the empty heap, we set the new tree as the only element to create
a singleton forest; otherwise, we compare the ranks of the new tree and the first tree in
2 The C and C++ programs are also available along with this book
210 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

the forest, if they are same, we link them together, and recursively insert the linked result
(a tree with rank increased by one) to the rest of the forest; If they are not same, since
the pre-condition constraints the rank of the new tree, it must be the smallest, we put
this new tree in front of all the other trees in the forest.
From the binomial properties mentioned above, there are at most O(lg n) binomial
trees in the forest, where n is the total number of nodes. Thus function insertT performs
at most O(lg n) times linking, which are all constant time operation. So the performance
of insertT is O(lg n). 3
The relative Haskell program is given as below.
insertTree [] t = [t]
insertTree ts@(t':ts') t = if rank t < rank t' then t:ts
else insertTree ts' (link t t')

With this auxiliary function, it’s easy to realize the insertion. We can wrap the new
element to be inserted as the only leaf of a tree, then insert this tree to the binomial heap.

insert(H, x) = insertT (H, node(0, x, ϕ)) (10.3)

And we can continuously build a heap from a series of elements by folding. For example
the following Haskell define a helper function ’fromList’.
fromList = foldl insert []

Since wrapping an element as a singleton tree takes O(1) time, the real work is done
in insertT , the performance of binomial heap insertion is bound to O(lg n).
The insertion algorithm can also be realized with imperative approach.

Algorithm 1 Insert a tree with ’left-child-right-sibling’ method.

1: function Insert-Tree(H, T )
2: while H 6= ϕ∧ Rank(Head(H)) = Rank(T ) do
3: (T1 , H) ← Extract-Head(H)
4: T ← Link(T, T1 )
5: Sibling(T ) ← H
6: return T

Algorithm 1 continuously linking the first tree in a heap with the new tree to be
inserted if they have the same rank. After that, it puts the linked-list of the rest trees as
the sibling, and returns the new linked-list.
If using a container to manage the children of a node, the algorithm can be given in
Algorithm 2.

Algorithm 2 Insert a tree with children managed by a container.

1: function Insert-Tree’(H, T )
2: while H 6= ϕ∧ Rank(H[0]) = Rank(T ) do
3: T1 ← Pop(H)
4: T ← Link(T, T1 )
5: Head-Insert(H, T )
6: return H

3 There is interesting observation by comparing this operation with adding two binary numbers. Which

will lead to topic of numeric representation[3].

10.2. BINOMIAL HEAPS 211

In this algorithm, function Pop removes the first tree T1 = H[0] from the forest. And
function Head-Insert, insert a new tree before any other trees in the heap, so that it
becomes the first element in the forest.
With either Insert-Tree or Insert-Tree’ defined. Realize the binomial heap in-
sertion is trivial.

Algorithm 3 Imperative insert algorithm

1: function Insert(H, x)
2: return Insert-Tree(H, Node(0, x, ϕ))

The following python program implement the insert algorithm by using a container
to manage sub-trees. the ‘left-child, right-sibling’ program is left as an exercise.
def insert_tree(ts, t):
while ts ̸= [] and [Link] == ts[0].rank:
t = link(t, [Link](0))
[Link](0, t)
return ts

def insert(h, x):

return insert_tree(h, BinomialTree(x))

Exercise 10.2
Write the insertion program in your favorite imperative programming language by
using the ‘left-child, right-sibling’ approach.

Merge two heaps

When merge two binomial heaps, we actually try to merge two forests of binomial trees.
According to the definition, there can’t be two trees with the same rank and the ranks
are in monotonically increasing order. Our strategy is very similar to merge sort. That
in every iteration, we take the first tree from each forest, compare their ranks, and pick
the smaller one to the result heap; if the ranks are equal, we then perform linking to get
a new tree, and recursively insert this new tree to the result of merging the rest trees.
Figure 10.7 illustrates the idea of this algorithm. This method is different from the
one given in [4].
We can formalize this idea with a function. For non-empty cases, we denote the two
heaps as H1 = {T1 , T2 , ...} and H2 = {T1′ , T2′ , ...}. Let H1′ = {T2 , T3 , ...} and H2′ =
{T2′ , T3′ , ...}.


 H1 : H2 = ϕ


 H2 : H1 = ϕ
merge(H1 , H2 ) = {T1 } ∪ merge(H1′ , H2 ) : Rank(T1 ) < Rank(T1′ )



 {T1′ } ∪ merge(H1 , H2′ ) : Rank(T1 ) > Rank(T1′ )

insertT (merge(H1′ , H2′ ), link(T1 , T1′ )) : otherwise
(10.4)
To analysis the performance of merge, suppose there are m1 trees in H1 , and m2 trees
in H2 . There are at most m1 +m2 trees in the merged result. If there are no two trees have
the same rank, the merge operation is bound to O(m1 + m2 ). While if there need linking
for the trees with same rank, insertT performs at most O(m1 + m2 ) time. Consider the
fact that m1 = 1 + blg n1 c and m2 = 1 + blg n2 c, where n1 , n2 are the numbers of nodes
in each heap, and blg n1 c + blg n2 c ≤ 2blg nc, where n = n1 + n2 , is the total number of
nodes. the final performance of merging is O(lg n).
Translating this algorithm to Haskell yields the following program.
212 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

t1 ... t2 ...

Rank(t1)<Rank(t2)?

the smaller

T1 T2 ... Ti ...

(a) Pick the tree with smaller rank

to the result.

t2 ... t1 ...

Rank(t1)=Rank(t2)

link(t1, t2)

insert
T1 T2 ... + Ti merge rest

(b) If two trees have same rank, link them to a new tree, and recursively
insert it to the merge result of the rest.

Figure 10.7: Merge two heaps.

10.2. BINOMIAL HEAPS 213

merge ts1 [] = ts1

merge [] ts2 = ts2
merge ts1@(t1:ts1') ts2@(t2:ts2')
| rank t1 < rank t2 = t1:(merge ts1' ts2)
| rank t1 > rank t2 = t2:(merge ts1 ts2')
| otherwise = insertTree (merge ts1' ts2') (link t1 t2)

Merge algorithm can also be described in imperative way as shown in algorithm 4.

Algorithm 4 imperative merge two binomial heaps

1: function Merge(H1 , H2 )
2: if H1 = ϕ then
3: return H2
4: if H2 = ϕ then
5: return H1
6: H←ϕ
7: while H1 6= ϕ ∧ H2 6= ϕ do
8: T ←ϕ
9: if Rank(H1 ) < Rank(H2 ) then
10: (T, H1 ) ← Extract-Head(H1 )
11: else if Rank(H2 ) < Rank(H1 ) then
12: (T, H2 ) ← Extract-Head(H2 )
13: else ▷ Equal rank
14: (T1 , H1 ) ← Extract-Head(H1 )
15: (T2 , H2 ) ← Extract-Head(H2 )
16: T ← Link(T1 , T2 )
17: Append-Tree(H, T )
18: if H1 6= ϕ then
19: Append-Trees(H, H1 )
20: if H2 6= ϕ then
21: Append-Trees(H, H2 )
22: return H

Since both heaps contain binomial trees with rank in monotonically increasing order.
Each iteration, we pick the tree with smallest rank and append it to the result heap.
If both trees have same rank we perform linking first. Consider the Append-Tree
algorithm, The rank of the new tree to be appended, can’t be less than any other trees in
the result heap according to our merge strategy, however, it might be equal to the rank of
the last tree in the result heap. This can happen if the last tree appended are the result
of linking, which will increase the rank by one. In this case, we must link the new tree to
be inserted with the last tree. In below algorithm, suppose function Last(H) refers to
the last tree in a heap, and Append(H, T ) just appends a new tree at the end of a forest.
1: function Append-Tree(H, T )
2: if H 6= ϕ∧ Rank(T ) = Rank(Last(H)) then
3: Last(H) ← Link(T , Last(H))
4: else
5: Append(H, T )
Function Append-Trees repeatedly call this function, so that it can append all trees
in a heap to the other heap.
1: function Append-Trees(H1 , H2 )
2: for each T ∈ H2 do
214 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

3: H1 ← Append-Tree(H1 , T )
The following Python program translates the merge algorithm.
def append_tree(ts, t):
if ts ̸= [] and ts[-1].rank == [Link]:
ts[-1] = link(ts[-1], t)
else:
[Link](t)
return ts

def append_trees(ts1, ts2):

return reduce(append_tree, ts2, ts1)

def merge(ts1, ts2):

if ts1 == []:
return ts2
if ts2 == []:
return ts1
ts = []
while ts1 ̸= [] and ts2 ̸= []:
t = None
if ts1[0].rank < ts2[0].rank:
t = [Link](0)
elif ts2[0].rank < ts1[0].rank:
t = [Link](0)
else:
t = link([Link](0), [Link](0))
ts = append_tree(ts, t)
ts = append_trees(ts, ts1)
ts = append_trees(ts, ts2)
return ts

Exercise 10.3
The program given above uses a container to manage sub-trees. Implement the merge
algorithm in your favorite imperative programming language with ‘left-child, right-sibling’
approach.

Pop

Among the forest which forms the binomial heap, each binomial tree conforms to heap
property that the root contains the minimum element in that tree. However, the order
relationship of these roots can be arbitrary. To find the minimum element in the heap,
we can select the smallest root of these trees. Since there are lg n binomial trees, this
approach takes O(lg n) time.
However, after we locate the minimum element (which is also know as the top element
of a heap), we need remove it from the heap and keep the binomial property to accom-
plish heap-pop operation. Suppose the forest forms the binomial heap consists trees of
Bi , Bj , ..., Bp , ..., Bm , where Bk is a binomial tree of rank k, and the minimum element is
the root of Bp . If we delete it, there will be p children left, which are all binomial trees
with ranks p − 1, p − 2, ..., 0.
One tool at hand is that we have defined O(lg n) merge function. A possible approach
is to reverse the p children, so that their ranks change to monotonically increasing order,
and forms a binomial heap Hp . The rest of trees is still a binomial heap, we represent it
as H ′ = H − Bp . Merging Hp and H ′ given the final result of pop. Figure 10.8 illustrates
this idea.
In order to realize this algorithm, we first need to define an auxiliary function, which
10.2. BINOMIAL HEAPS 215

Figure 10.8: Pop the minimum element from a binomial heap.

can extract the tree contains the minimum element at root from the forest.

 (T, ϕ) : H is a singleton as {T }
extractM in(H) = (T1 , H ′ ) : Root(T1 ) < Root(T ′ ) (10.5)

(T ′ , {T1 } ∪ H ′′ ) : otherwise
where
H = {T1 , T2 , ...} for the non-empty forest case;
H ′ = {T2 , T3 , ...} is the forest without the first tree;
(T ′ , H ′′ ) = extractM in(H ′ )
The result of this function is a tuple. The first part is the tree which has the minimum
element at root, the second part is the rest of the trees after remove the first part from
the forest.
This function examine each of the trees in the forest thus is bound to O(lg n) time.
The relative Haskell program can be give respectively.
extractMin [t] = (t, [])
extractMin (t:ts) = if root t < root t' then (t, ts)
else (t', t:ts')
where
(t', ts') = extractMin ts

With this function defined, to return the minimum element is trivial.

findMin = root ◦ fst ◦ extractMin

Of course, it’s possible to just traverse forest and pick the minimum root without
remove the tree for this purpose. Below imperative algorithm describes it with ‘left child,
right sibling’ approach.
216 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

1: function Find-Minimum(H)
2: T ← Head(H)
3: min ← ∞
4: while T 6= ϕ do
5: if Key(T )< min then
6: min ← Key(T )
7: T ← Sibling(T )
8: return min
While if we manage the children with collection containers, the link list traversing is
abstracted as to find the minimum element among the list. The following Python program
shows about this situation.
def find_min(ts):
min_t = min(ts, key=lambda t: [Link])
return min_t.key

Next we define the function to delete the minimum element from the heap by using
extractM in.

delteM in(H) = merge(reverse(Children(T )), H ′ ) (10.6)

where

(T, H ′ ) = extractM in(H)

Translate the formula to Haskell program is trivial and we’ll skip it.
To realize the algorithm in procedural way takes extra efforts including list reversing
etc. We left these details as exercise to the reader. The following pseudo code illustrate
the imperative pop algorithm
1: function Extract-Min(H)
2: (Tmin , H) ← Extract-Min-Tree(H)
3: H ← Merge(H, Reverse(Children(Tmin )))
4: return (Key(Tmin ), H)
With pop operation defined, we can realize heap sort by creating a binomial heap from
a series of numbers, than keep popping the smallest number from the heap till it becomes
empty.

sort(xs) = heapSort(f romList(xs)) (10.7)

And the real work is done in function heapSort.

{
ϕ : H=ϕ
heapSort(H) = (10.8)
{f indM in(H)} ∪ heapSort(deleteM in(H)) : otherwise

Translate to Haskell yields the following program.

heapSort = hsort ◦ fromList where
hsort [] = []
hsort h = (findMin h):(hsort $ deleteMin h)

Function fromList can be defined by folding. Heap sort can also be expressed in
procedural way respectively. Please refer to previous chapter about binary heap for detail.

Exercise 10.4
10.3. FIBONACCI HEAPS 217

• Write the program to return the minimum element from a binomial heap in your
favorite imperative programming language with ’left-child, right-sibling’ approach.
• Realize the Extract-Min-Tree() Algorithm.
• For ’left-child, right-sibling’ approach, reversing all children of a tree is actually
reversing a single-direct linked-list. Write a program to reverse such linked-list in
your favorite imperative programming language.

More words about binomial heap

As what we have shown that insertion and merge are bound to O(lg n) time. The results
are all ensure for the worst case. The amortized performance are O(1). We skip the proof
for this fact.

10.3 Fibonacci Heaps

It’s interesting that why the name is given as ‘Fibonacci heap’. In fact, there is no direct
connection from the structure design to Fibonacci series. The inventors of ‘Fibonacci
heap’, Michael L. Fredman and Robert E. Tarjan, utilized the property of Fibonacci
series to prove the performance time bound, so they decided to use Fibonacci to name
this data structure.[4]

10.3.1 Definition
Fibonacci heap is essentially a lazy evaluated binomial heap. Note that, it doesn’t mean
implementing binomial heap in lazy evaluation settings, for instance Haskell, brings Fi-
bonacci heap automatically. However, lazy evaluation setting does help in realization.
For example in [56], presents a elegant implementation.
Fibonacci heap has excellent performance theoretically. All operations except for pop
are bound to amortized O(1) time. In this section, we’ll give an algorithm different from
some popular textbook[4]. Most of the ideas present here are based on Okasaki’s work[57].
Let’s review and compare the performance of binomial heap and Fibonacci heap (more
precisely, the performance goal of Fibonacci heap).
operation Binomial heap Fibonacci heap
insertion O(lg n) O(1)
merge O(lg n) O(1)
top O(lg n) O(1)
pop O(lg n) amortized O(lg n)
Consider where is the bottleneck of inserting a new element x to binomial heap. We
actually wrap x as a singleton leaf and insert this tree into the heap which is actually a
forest.
During this operation, we inserted the tree in monotonically increasing order of rank,
and once the rank is equal, recursively linking and inserting will happen, which lead to
the O(lg n) time.
As the lazy strategy, we can postpone the ordered-rank insertion and merging op-
erations. On the contrary, we just put the singleton leaf to the forest. The problem
is that when we try to find the minimum element, for example the top operation, the
performance will be bad, because we need check all trees in the forest, and there aren’t
only O(lg n) trees.
In order to locate the top element in constant time, we must remember where is the
tree contains the minimum element as root.
218 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

Based on this idea, we can reuse the definition of binomial tree and give the definition
of Fibonacci heap as the following Haskell program for example.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}

The Fibonacci heap is either empty or a forest of binomial trees with the minimum
element stored in a special one explicitly.
data FibHeap a = E | FH { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]}

For convenient purpose, we also add a size field to record how many elements are there
in a heap.
The data layout can also be defined in imperative way as the following ANSI C code.
struct node{
Key key;
struct node ∗next, ∗prev, ∗parent, ∗children;
int degree; /∗ As known as rank ∗/
int mark;
};

struct FibHeap{
struct node ∗roots;
struct node ∗minTr;
int n; /∗ number of nodes ∗/
};

For generality, Key can be a customized type, we use integer for illustration purpose.
typedef int Key;

In this chapter, we use the circular doubly linked-list for imperative settings to realize
the Fibonacci Heap as described in [4]. It makes many operations easy and fast. Note
that, there are two extra fields added. A degree, also known as rank for a node is the
number of children of this node; Flag mark is used only in decreasing key operation. It
will be explained in detail in later section.

10.3.2 Basic heap operations

As we mentioned that Fibonacci heap is essentially binomial heap implemented in a lazy
evaluation strategy, we’ll reuse many algorithms defined for binomial heap.

Insert a new element to the heap

Recall the insertion algorithm of binomial tree. It can be treated as a special case of
merge operation, that one heap contains only a singleton tree.

insert(H, x) = merge(H, singleton(x)) (10.9)

where singleton is an auxiliary function to wrap an element to a one-leaf-tree.

singleton(x) = F ibHeap(1, node(1, x, ϕ), ϕ)

Note that function F ibHeap() accepts three parameters, a size value, which is 1 for
this one-leaf-tree, a special tree which contains the minimum element as root, and a list
10.3. FIBONACCI HEAPS 219

of other binomial trees in the forest. The meaning of function node() is as same as before,
that it creates a binomial tree from a rank, an element, and a list of children.
Insertion can also be realized directly by appending the new node to the forest and
updated the record of the tree which contains the minimum element.
1: function Insert(H, k)
2: x ← Singleton(k) ▷ Wrap x to a node
3: append x to root list of H
4: if Tmin (H) = N IL ∨ k < Key(Tmin (H)) then
5: Tmin (H) ← x
6: n(H) ← n(H)+1
Where function Tmin () returns the tree which contains the minimum element at root.
The following C source snippet is a translation for this algorithm.
struct FibHeap∗ insert_node(struct FibHeap∗ h, struct node∗ x){
h = add_tree(h, x);
if(h→minTr == NULL | | x→key < h→minTr→key)
h→minTr = x;
h→n++;
return h;
}

Exercise 10.5
Implement the insert algorithm in your favorite imperative programming language
completely. This is also an exercise to circular doubly linked list manipulation.

Merge two heaps

Different with the merging algorithm of binomial heap, we post-pone the linking opera-
tions later. The idea is to just put all binomial trees from each heap together, and choose
one special tree which record the minimum element for the result heap.


 H1 : H2 = ϕ

H2 : H1 = ϕ
merge(H1 , H2 ) =

 F ibHeap(s 1 + s2 , T 1min , {T2min } ∪ T1 ∪ T2 ) : root(T1min ) < root(T2min )

F ibHeap(s1 + s2 , T2min , {T1min } ∪ T1 ∪ T2 ) : otherwise
(10.10)
where s1 and s2 are the size of H1 and H2 ; T1min and T2min are the special trees
with minimum element as root in H1 and H2 respectively; T1 = {T11 , T12 , ...} is a forest
contains all other binomial trees in H1 ; while T2 has the same meaning as T1 except that
it represents the forest in H2 . Function root(T ) return the root element of a binomial
tree.
Note that as long as the ∪ operation takes constant time, these merge algorithm is
bound to O(1). The following Haskell program is the translation of this algorithm.
merge h E = h
merge E h = h
merge h1@(FH sz1 minTr1 ts1) h2@(FH sz2 minTr2 ts2)
| root minTr1 < root minTr2 = FH (sz1+sz2) minTr1 (minTr2:ts2++ts1)
| otherwise = FH (sz1+sz2) minTr2 (minTr1:ts1++ts2)

Merge algorithm can also be realized imperatively by concatenating the root lists of
the two heaps.
1: function Merge(H1 , H2 )
2: H←Φ
220 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

3: Root(H) ← Concat(Root(H1 ), Root(H2 ))

4: if Key(Tmin (H1 )) < Key(Tmin (H2 )) then
5: Tmin (H) ← Tmin (H1 )
6: else
7: Tmin (H) ← Tmin (H2 )
n(H) = n(H1 ) + n(H2 )
8: return H
This function assumes neither H1 , nor H2 is empty. And it’s easy to add handling to
these special cases as the following ANSI C program.
struct FibHeap∗ merge(struct FibHeap∗ h1, struct FibHeap∗ h2){
struct FibHeap∗ h;
if(is_empty(h1))
return h2;
if(is_empty(h2))
return h1;
h = empty();
h→roots = concat(h1→roots, h2→roots);
if(h1→minTr→key < h2→minTr→key)
h→minTr = h1→minTr;
else
h→minTr = h2→minTr;
h→n = h1→n + h2→n;
free(h1);
free(h2);
return h;
}

With merge function defined, the O(1) insertion algorithm is realized as well. And
we can also give the O(1) time top function as below.

top(H) = root(Tmin ) (10.11)

Exercise 10.6
Implement the circular doubly linked list concatenation function in your favorite im-
perative programming language.

Extract the minimum element from the heap (pop)

The pop operation is the most complex one in Fibonacci heap. Since we postpone the
tree consolidation in merge algorithm. We have to compensate it somewhere. Pop is the
only place left as we have defined, insert, merge, top already.
There is an elegant procedural algorithm to do the tree consolidation by using an
auxiliary array[4]. We’ll show it later in imperative approach section.
In order to realize the purely functional consolidation algorithm, let’s first consider a
similar number puzzle.
Given a list of numbers, such as {2, 1, 1, 4, 8, 1, 1, 2, 4}, we want to add any two values
if they are same. And repeat this procedure till all numbers are unique. The result of the
example list should be {8, 16} for instance.
One solution to this problem will as the following.

consolidate(L) = f old(meld, ϕ, L) (10.12)

Where f old() function is defined to iterate all elements from a list, applying a specified
function to the intermediate result and each element. it is sometimes called as reducing.
Please refer to Appendix A and the chapter of binary search tree for it.
10.3. FIBONACCI HEAPS 221

Table 10.1: Steps of consolidate numbers

number intermediate result result

2 2 2
1 1, 2 1, 2
1 (1+1), 2 4
4 (4+4) 8
8 (8+8) 16
1 1, 16 1, 16
1 (1+1), 16 2, 16
2 (2+2), 16 4, 16
4 (4+4), 16 8, 16

L = {x1 , x2 , ..., xn }, denotes a list of numbers; and we’ll use L′ = {x2 , x3 , ..., xn } to
represent the rest of the list with the first element removed. Function meld() is defined
as below.


 {x} : L = ϕ

meld(L′ , x + x1 ) : x = x1
meld(L, x) = (10.13)

 {x} ∪ L : x < x1
 ′
{x1 } ∪ meld(L , x) : otherwise

The consolidate() function works as the follows. It maintains an ordered result list
L, contains only unique numbers, which is initialized from an empty list ϕ. Each time
it process an element x, it firstly check if the first element in L is equal to x, if so, it
will add them together (which yields 2x), and repeatedly check if 2x is equal to the next
element in L. This process won’t stop until either the element to be melt is not equal to
the head element in the rest of the list, or the list becomes empty. Table 10.1 illustrates
the process of consolidating number sequence {2, 1, 1, 4, 8, 1, 1, 2, 4}. Column one lists the
number ’scanned’ one by one; Column two shows the intermediate result, typically the
new scanned number is compared with the first number in result list. If they are equal,
they are enclosed in a pair of parentheses; The last column is the result of meld, and it
will be used as the input to next step processing.
The Haskell program can be give accordingly.
consolidate = foldl meld [] where
meld [] x = [x]
meld (x':xs) x | x == x' = meld xs (x+x')
| x < x' = x:x':xs
| otherwise = x': meld xs x

We’ll analyze the performance of consolidation as a part of pop operation in later

section.
The tree consolidation is very similar to this algorithm except it performs based on
rank. The only thing we need to do is to modify meld() function a bit, so that it compare
on ranks and do linking instead of adding.


 {x} : L = ϕ

meld(L′ , link(x, x1 )) : rank(x) = rank(x1 )
meld(L, x) = (10.14)

 {x} ∪ L : rank(x) < rank(x1 )

{x1 } ∪ meld(L′ , x) : otherwise

The final consolidate Haskell program changes to the below version.

consolidate = foldl meld [] where
222 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

meld [] t = [t]
meld (t':ts) t | rank t == rank t' = meld ts (link t t')
| rank t < rank t' = t:t':ts
| otherwise = t' : meld ts t

Figure 10.9 and 10.10 show the steps of consolidation when processing a Fibonacci
Heap contains different ranks of trees. Comparing with table 10.1 reveals the similarity.

a c d e i q r s u

b f g j k m t v w

h l n o x

(a) Before consolidation

a b c e

c a b c d f g

b d h

(b) Step 1, 2 (c) Step 3, ’d’ is firstly linked to (d) Step 4

’c’, then repeatedly linked to ’a’.

Figure 10.9: Steps of consolidation

After we merge all binomial trees, including the special tree record for the minimum
element in root, in a Fibonacci heap, the heap becomes a Binomial heap. And we lost
the special tree, which gives us the ability to return the top element in O(1) time.
It’s necessary to perform a O(lg n) time search to resume the special tree. We can
reuse the function extractM in() defined for Binomial heap.
It’s time to give the final pop function for Fibonacci heap as all the sub problems
have been solved. Let Tmin denote the special tree in the heap to record the minimum
element in root; T denote the forest contains all the other trees except for the special
tree, s represents the size of a heap, and function children() returns all sub trees except
the root of a binomial tree.
{
ϕ : T = ϕ ∧ children(Tmin ) = ϕ
deleteM in(H) = ′
F ibHeap(s − 1, Tmin , T′ ) : otherwise
(10.15)
Where
′
(Tmin , T′ ) = extractM in(consolidate(children(Tmin ) ∪ T))
10.3. FIBONACCI HEAPS 223

a q a

b c e i b c e i

d f g j k m d f g j k m

h l n o h l n o

p p

(a) Step 5 (b) Step 6

q a

r s b c e i

t d f g j k m

h l n o

(c) Step 7, 8, ’r’ is firstly linked to ’q’, then ’s’ is linked to

’q’.

Figure 10.10: Steps of consolidation

224 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

Translate to Haskell yields the below program.

deleteMin (FH _ (Node _ x []) []) = E
deleteMin h@(FH sz minTr ts) = FH (sz-1) minTr' ts' where
(minTr', ts') = extractMin $ consolidate (children minTr ++ ts)

The main part of the imperative realization is similar. We cut all children of Tmin and
append them to root list, then perform consolidation to merge all trees with the same
rank until all trees are unique in term of rank.
1: function Delete-Min(H)
2: x ← Tmin (H)
3: if x 6= N IL then
4: for each y ∈ Children(x) do
5: append y to root list of H
6: Parent(y) ← N IL
7: remove x from root list of H
8: n(H) ← n(H) - 1
9: Consolidate(H)
10: return x
Algorithm Consolidate utilizes an auxiliary array A to do the merge job. Array
A[i] is defined to store the tree with rank (degree) i. During the traverse of root list, if
we meet another tree of rank i, we link them together to get a new tree of rank i + 1.
Next we clean A[i], and check if A[i + 1] is empty and perform further linking if necessary.
After we finish traversing all roots, array A stores all result trees and we can re-construct
the heap from it.
1: function Consolidate(H)
2: D ← Max-Degree(n(H))
3: for i ← 0 to D do
4: A[i] ← N IL
5: for each x ∈ root list of H do
6: remove x from root list of H
7: d ← Degree(x)
8: while A[d] 6= N IL do
9: y ← A[d]
10: x ← Link(x, y)
11: A[d] ← N IL
12: d←d+1
13: A[d] ← x
14: Tmin (H) ← N IL ▷ root list is NIL at the time
15: for i ← 0 to D do
16: if A[i] 6= N IL then
17: append A[i] to root list of H.
18: if Tmin = N IL∨ Key(A[i]) < Key(Tmin (H)) then
19: Tmin (H) ← A[i]
The only unclear sub algorithm is Max-Degree, which can determine the upper
bound of the degree of any node in a Fibonacci Heap. We’ll delay the realization of it to
the last sub section.
Feed a Fibonacci Heap shown in Figure 10.9 to the above algorithm, Figure 10.11,
10.12 and 10.13 show the result trees stored in auxiliary array A in every steps.
Translate the above algorithm to ANSI C yields the below program.
void consolidate(struct FibHeap∗ h){
10.3. FIBONACCI HEAPS 225

A[0] A[1] A[2] A[3] A[4]

A[0] A[1] A[2] A[3] A[4] a

A[0] A[1] A[2] A[3] A[4] a b c e

c a b c d f g

b d h

(a) Step 1, 2 (b) Step 3, Since A0 ̸= (c) Step 4

N IL, ’d’ is firstly linked to
’c’, and clear A0 to N IL.
Again, as A1 ̸= N IL, ’c’
is linked to ’a’ and the new
tree is stored in A2 .

Figure 10.11: Steps of consolidation

if(!h→roots)
return;
int D = max_degree(h→n)+1;
struct node ∗x, ∗y;
struct node∗∗ a = (struct node∗∗)malloc(sizeof(struct node∗)∗(D+1));
int i, d;
for(i=0; i ≤ D; ++i)
a[i] = NULL;
while(h→roots){
x = h→roots;
h→roots = remove_node(h→roots, x);
d= x→degree;
while(a[d]){
y = a[d]; /∗ Another node has the same degree as x ∗/
x = link(x, y);
a[d++] = NULL;
}
a[d] = x;
}
h→minTr = h→roots = NULL;
for(i=0; i ≤ D; ++i)
if(a[i]){
h→roots = append(h→roots, a[i]);
if(h→minTr == NULL | | a[i]→key < h→minTr→key)
h→minTr = a[i];
}
free(a);
}

Exercise 10.7
Implement the remove function for circular doubly linked list in your favorite imper-
ative programming language.
226 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

A[0] A[1] A[2] A[3] A[4]

b c e i

d f g j k m

h l n o

(a) Step 5
A[0] A[1] A[2] A[3] A[4]

q a

b c e i

d f g j k m

h l n o

(b) Step 6

Figure 10.12: Steps of consolidation

10.3. FIBONACCI HEAPS 227

A[0] A[1] A[2] A[3] A[4]

q a

r s b c e i

t d f g j k m

h l n o

(a) Step 7, 8, Since A0 ̸= N IL, ’r’ is firstly linked to ’q’, and

the new tree is stored in A1 (A0 is cleared); then ’s’ is linked
to ’q’, and stored in A2 (A1 is cleared).

Figure 10.13: Steps of consolidation

10.3.3 Running time of pop

In order to analyze the amortize performance of pop, we adopt potential method. Reader
can refer to [4] for a formal definition. In this chapter, we only give a intuitive illustration.
Remind the gravity potential energy, which is defined as

E =M ·g·h

Suppose there is a complex process, which moves the object with mass M up and
down, and finally the object stop at height h′ . And if there exists friction resistance Wf ,
We say the process works the following power.

W = M · g · (h′ − h) + Wf

Figure 10.14 illustrated this concept.

We treat the Fibonacci heap pop operation in a similar way, in order to evaluate the
cost, we firstly define the potential Φ(H) before extract the minimum element. This
potential is accumulated by insertion and merge operations executed so far. And after
tree consolidation and we get the result H ′ , we then calculate the new potential Φ(H ′ ).
The difference between Φ(H ′ ) and Φ(H) plus the contribution of consolidate algorithm
indicates the amortized performance of pop.
For pop operation analysis, the potential can be defined as

Φ(H) = t(H) (10.16)

Where t(H) is the number of trees in Fibonacci heap forest. We have t(H) = 1 +
length(T) for any non-empty heap.
For the n-nodes Fibonacci heap, suppose there is an upper bound of ranks for all trees
as D(n). After consolidation, it ensures that the number of trees in the heap forest is at
most D(n) + 1.
228 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

Figure 10.14: Gravity potential energy.

Before consolidation, we actually did another important thing, which also contribute
to running time, we removed the root of the minimum tree, and concatenate all children
left to the forest. So consolidate operation at most processes D(n) + t(H) − 1 trees.
Summarize all the above factors, we deduce the amortized cost as below.

T = Tconsolidation + Φ(H ′ ) − Φ(H)

= O(D(n) + t(H) − 1) + (D(n) + 1) − t(H) (10.17)
= O(D(n))

If only insertion, merge, and pop function are applied to Fibonacci heap. We ensure
that all trees are binomial trees. It is easy to estimate the upper limit D(n) is O(lg n).
(Suppose the extreme case, that all nodes are in only one Binomial tree).
However, we’ll show in next sub section that, there is operation can violate the bino-
mial tree assumption.

Exercise 10.8
Why the tree consolidation time is proportion to the number of trees it processed?

10.3.4 Decreasing key

There is a special heap operation left. It only makes sense for imperative settings. It’s
about decreasing key of a certain node. Decreasing key plays important role in some
Graphic algorithms such as Minimum Spanning tree algorithm and Dijkstra’s algorithm
[4]. In that case we hope the decreasing key takes O(1) amortized time.
However, we can’t define a function like Decrease(H, k, k ′ ), which first locates a node
with key k, then decrease k to k ′ by replacement, and then resume the heap properties.
This is because the time for locating phase is bound to O(n) time, since we don’t have a
pointer to the target node.
In imperative setting, we can define the algorithm as Decrease-Key(H, x, k). Here
x is a node in heap H, which we want to decrease its key to k. We needn’t perform a
search, as we have x at hand. It’s possible to give an amortized O(1) solution.
When we decreased the key of a node, if it’s not a root, this operation may violate
the property Binomial tree that the key of parent is less than all keys of children. So we
10.3. FIBONACCI HEAPS 229

need to compare the decreased key with the parent node, and if this case happens, we can
cut this node and append it to the root list. (Remind the recursive swapping solution for
binary heap which leads to O(lg n))

x ... r

... y ...

...

Figure 10.15: x < y, cut tree x from its parent, and add x to root list.

Figure 10.15 illustrates this situation. After decreasing key of node x, it is less than
y, we cut x off its parent y, and ’past’ the whole tree rooted at x to root list.
Although we recover the property of that parent is less than all children, the tree
isn’t any longer a Binomial tree after it losses some sub tree. If a tree losses too many
of its children because of cutting, we can’t ensure the performance of merge-able heap
operations. Fibonacci Heap adds another constraints to avoid such problem:
If a node losses its second child, it is immediately cut from parent, and added to root
list
The final Decrease-Key algorithm is given as below.
1: function Decrease-Key(H, x, k)
2: Key(x) ← k
3: p ← Parent(x)
4: if p 6= N IL ∧ k < Key(p) then
5: Cut(H, x)
6: Cascading-Cut(H, p)
7: if k < Key(Tmin (H)) then
8: Tmin (H) ← x
Where function Cascading-Cut uses the mark to determine if the node is losing the
second child. the node is marked after it losses the first child. And the mark is cleared
in Cut function.
1: function Cut(H, x)
2: p ← Parent(x)
3: remove x from p
4: Degree(p) ← Degree(p) - 1
5: add x to root list of H
6: Parent(x) ← N IL
7: Mark(x) ← F ALSE
230 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

During cascading cut process, if x is marked, which means it has already lost one
child. We recursively performs cut and cascading cut on its parent till reach to root.
1: function Cascading-Cut(H, x)
2: p ← Parent(x)
3: if p 6= N IL then
4: if Mark(x) = F ALSE then
5: Mark(x) ← T RU E
6: else
7: Cut(H, x)
8: Cascading-Cut(H, p)
The relevant ANSI C decreasing key program is given as the following.
void decrease_key(struct FibHeap∗ h, struct node∗ x, Key k){
struct node∗ p = x→parent;
x→key = k;
if(p && k < p→key){
cut(h, x);
cascading_cut(h, p);
}
if(k < h→minTr→key)
h→minTr = x;
}

void cut(struct FibHeap∗ h, struct node∗ x){

struct node∗ p = x→parent;
p→children = remove_node(p→children, x);
p→degree--;
h→roots = append(h→roots, x);
x→parent = NULL;
x→mark = 0;
}

void cascading_cut(struct FibHeap∗ h, struct node∗ x){

struct node∗ p = x→parent;
if(p){
if(!x→mark)
x→mark = 1;
else{
cut(h, x);
cascading_cut(h, p);
}
}
}

Exercise 10.9
Prove that Decrease-Key algorithm is amortized O(1) time.

10.3.5 The name of Fibonacci Heap

It’s time to reveal the reason why the data structure is named as ’Fibonacci Heap’.
There is only one undefined algorithm so far, Max-Degree(n). Which can determine
the upper bound of degree for any node in a n nodes Fibonacci Heap. We’ll give the proof
by using Fibonacci series and finally realize Max-Degree algorithm.

Lemma 10.3.1. For any node x in a Fibonacci Heap, denote k = degree(x), and |x| =
size(x), then
|x| ≥ Fk+2 (10.18)
10.3. FIBONACCI HEAPS 231

Where Fk is Fibonacci series defined as the following.


 0 : k=0
Fk = 1 : k=1

Fk−1 + Fk−2 : k ≥ 2

Proof. Consider all k children of node x, we denote them as y1 , y2 , ..., yk in the order of
time when they were linked to x. Where y1 is the oldest, and yk is the youngest.
Obviously, |yi | ≥ 0. When we link yi to x, children y1 , y2 , ..., yi−1 have already been
there. And algorithm Link only links nodes with the same degree. Which indicates at
that time, we have

degree(yi ) = degree(x) = i − 1

After that, node yi can at most lost 1 child, (due to the decreasing key operation)
otherwise, if it will be immediately cut off and append to root list after the second child
loss. Thus we conclude

degree(yi ) ≥ i − 2

For any i = 2, 3, ..., k.

Let sk be the minimum possible size of node x, where degree(x) = k. For trivial cases,
s0 = 1, s1 = 2, and we have

|x| ≥ sk
∑
k
=2+ sdegree(yi )
i=2
∑
k
≥2+ si−2
i=2

We next show that sk > Fk+2 . This can be proved by induction. For trivial cases, we
have s0 = 1 ≥ F2 = 1, and s1 = 2 ≥ F3 = 2. For induction case k ≥ 2. We have

|x| ≥ sk
∑
k
≥2+ si−2
i=2
∑
k
≥2+ Fi
i=2
∑
k
=1+ Fi
i=0

At this point, we need prove that

∑
k
Fk+2 = 1 + Fi (10.19)
i=0

This can also be proved by using induction:

• Trivial case, F2 = 1 + F0 = 2
232 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

• Induction case,

Fk+2 = Fk+1 + Fk
∑
k−1
=1+ Fi + Fk
i=0
∑
k
=1+ Fi
i=0

Summarize all above we have the final result.

n ≥ |x| ≥ Fk + 2 (10.20)

√
Recall the result of AVL tree, that Fk ≥ ϕk , where ϕ = 1+2 5 is the golden ratio. We
also proved that pop operation is amortized O(lg n) algorithm.
Based on this result. We can define Function M axDegree as the following.

M axDegree(n) = 1 + blogϕ nc (10.21)

The imperative Max-Degree algorithm can also be realized by using Fibonacci se-
quences.
1: function Max-Degree(n)
2: F0 ← 0
3: F1 ← 1
4: k←2
5: repeat
6: Fk ← Fk1 + Fk2
7: k ←k+1
8: until Fk < n
9: return k − 2
Translate the algorithm to ANSI C given the following program.
int max_degree(int n){
int k, F;
int F2 = 0;
int F1 = 1;
for(F=F1+F2, k=2; F<n; ++k){
F2 = F1;
F1 = F;
F = F1 + F2;
}
return k-2;
}

10.4 Pairing Heaps

Although Fibonacci Heaps provide excellent performance theoretically, it is complex to
realize. People find that the constant behind the big-O is big. Actually, Fibonacci Heap
is more significant in theory than in practice.
In this section, we’ll introduce another solution, Pairing heap, which is one of the best
heaps ever known in terms of performance. Most operations including insertion, finding
10.4. PAIRING HEAPS 233

minimum element (top), merging are all bounds to O(1) time, while deleting minimum
element (pop) is conjectured to amortized O(lg n) time [58] [3]. Note that this is still
a conjecture for 15 years by the time I write this chapter. Nobody has been proven it
although there are much experimental data support the O(lg n) amortized result.
Besides that, pairing heap is simple. There exist both elegant imperative and func-
tional implementations.

10.4.1 Definition
Both Binomial Heaps and Fibonacci Heaps are realized with forest. While a pairing heaps
is essentially a K-ary tree. The minimum element is stored at root. All other elements
are stored in sub trees.
The following Haskell program defines pairing heap.
data PHeap a = E | Node a [PHeap a]

This is a recursive definition, that a pairing heap is either empty or a K-ary tree,
which is consist of a root node, and a list of sub trees.
Pairing heap can also be defined in procedural languages, for example ANSI C as
below. For illustration purpose, all heaps we mentioned later are minimum-heap, and we
assume the type of key is integer 4 . We use same linked-list based left-child, right-sibling
approach (aka, binary tree representation[4]).
typedef int Key;

struct node{
Key key;
struct node ∗next, ∗children, ∗parent;
};

Note that the parent field does only make sense for decreasing key operation, which
will be explained later on. we can omit it for the time being.

10.4.2 Basic heap operations

In this section, we first give the merging operation for pairing heap, which can be used to
realize insertion. Merging, insertion, and finding the minimum element are relative trivial
compare to the extracting minimum element operation.

Merge, insert, and find the minimum element (top)

The idea of merging is similar to the linking algorithm we shown previously for Binomial
heap. When we merge two pairing heaps, there are two cases.

• Trivial case, one heap is empty, we simply return the other heap as the result;

• Otherwise, we compare the root element of the two heaps, make the heap with
bigger root element as a new children of the other.

Let H1 , and H2 denote the two heaps, x and y be the root element of H1 and H2
respectively. Function Children() returns the children of a K-ary tree. Function N ode()
4 We can parametrize the key type with C++ template, but this is beyond our scope, please refer to

the example programs along with this book

234 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

can construct a K-ary tree from a root element and a list of children.


 H1 : H2 = ϕ

H2 : H1 = ϕ
merge(H1 , H2 ) = (10.22)

 N ode(x, {H2 } ∪ Children(H1 )) : x<y

N ode(y, {H1 } ∪ Children(H2 )) : otherwise
Where
x = Root(H1 )
y = Root(H2 )
It’s obviously that merging algorithm is bound to O(1) time 5 . The merge equation
can be translated to the following Haskell program.
merge h E = h
merge E h = h
merge h1@(Node x hs1) h2@(Node y hs2) =
if x < y then Node x (h2:hs1) else Node y (h1:hs2)

Merge can also be realized imperatively. With left-child, right sibling approach, we
can just link the heap, which is in fact a K-ary tree, with larger key as the first new child
of the other. This is constant time operation as described below.
1: function Merge(H1 , H2 )
2: if H1 = NIL then
3: return H2
4: if H2 = NIL then
5: return H1
6: if Key(H2 ) < Key(H1 ) then
7: Exchange(H1 ↔ H2 )
8: Insert H2 in front of Children(H1 )
9: Parent(H2 ) ← H1
10: return H1
Note that we also update the parent field accordingly. The ANSI C example program
is given as the following.
struct node∗ merge(struct node∗ h1, struct node∗ h2) {
if (h1 == NULL)
return h2;
if (h2 == NULL)
return h1;
if (h2→key < h1→key)
swap(&h1, &h2);
h2→next = h1→children;
h1→children = h2;
h2→parent = h1;
h1→next = NULL; /∗Break previous link if any∗/
return h1;
}

Where function swap() is defined in a similar way as Fibonacci Heap.

With merge defined, insertion can be realized as same as Fibonacci Heap in Equation
10.9. Definitely it’s O(1) time operation. As the minimum element is always stored in
root, finding it is trivial.
top(H) = Root(H) (10.23)
Same as the other two above operations, it’s bound to O(1) time.
5 Assume ∪ is constant time operation, this is true for linked-list settings, including ’cons’ like operation

in functional programming languages.

10.4. PAIRING HEAPS 235

Decrease key of a node

There is another operation, to decrease key of a given node, which only makes sense in
imperative settings as we explained in Fibonacci Heap section.
The solution is simple, that we can cut the node with the new smaller key from it’s
parent along with all its children. Then merge it again to the heap. The only special case
is that if the given node is the root, then we can directly set the new key without doing
anything else.
The following algorithm describes this procedure for a given node x, with new key k.
1: function Decrease-Key(H, x, k)
2: Key(x) ← k
3: if Parent(x) 6= NIL then
4: Remove x from Children(Parent(x)) Parent(x) ← NIL
5: return Merge(H, x)
6: return H
The following ANSI C program translates this algorithm.
struct node∗ decrease_key(struct node∗ h, struct node∗ x, Key key) {
x→key = key; /∗ Assume key ≤ x→key ∗/
if(x→parent) {
x→parent→children = remove_node(x→parent→children, x);
x→parent = NULL;
return merge(h, x);
}
return h;
}

Exercise 10.10
Implement the program of removing a node from the children of its parent in your
favorite imperative programming language. Consider how can we ensure the overall per-
formance of decreasing key is O(1) time? Is left-child, right sibling approach enough?

Delete the minimum element from the heap (pop)

Since the minimum element is always stored at root, after delete it during popping, the
rest things left are all sub-trees. These trees can be merged to one big tree.
pop(H) = mergeP airs(Children(H)) (10.24)
Pairing Heap uses a special approach that it merges every two sub-trees from left to
right in pair. Then merge these paired results from right to left which forms a final result
tree. The name of ‘Pairing Heap’ comes from the characteristic of this pair-merging.
Figure 10.16 and 10.17 illustrate the procedure of pair-merging.
The recursive pair-merging solution is quite similar to the bottom up merge sort[3].
Denote the children of a pairing heap as A, which is a list of trees of {T1 , T2 , T3 , ..., Tm }
for example. The mergeP airs() function can be given as below.

 Φ : A=Φ
mergeP airs(A) = T1 : A = {T1 } (10.25)

merge(merge(T1 , T2 ), mergeP airs(A′ )) : otherwise
where
A′ = {T3 , T4 , ..., Tm }
is the rest of the children without the first two trees.
The relative Haskell program of popping is given as the following.
236 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

5 4 3 12 7 10 11 6 9

15 13 8 17 14

(a) A pairing heap before pop.

5 4 3 12 7 10 11 6 9

15 13 8 17 14

(b) After root element 2 being removed, there are 9 sub-trees

left.
4 3 7 6 9

5 13 12 8 10 11 7 14

15 16

Figure 10.16: Remove the root element, and merge children in pairs.
10.4. PAIRING HEAPS 237

6 6

9 11 7 9 11

7 14 10 14

16 16

(a) Merge tree with 9, and tree with root 6. (b) Merge tree with root 7 to the result.
3

6 12 8

7 9 11

10 14

(c) Merge tree with root 3 to the result.

4 6 12 8

5 13 7 9 11

15 10 14

(d) Merge tree with root 4 to the result.

Figure 10.17: Steps of merge from right to left.

238 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

deleteMin (Node _ hs) = mergePairs hs where

mergePairs [] = E
mergePairs [h] = h
mergePairs (h1:h2:hs) = merge (merge h1 h2) (mergePairs hs)

The popping operation can also be explained in the following procedural algorithm.
1: function Pop(H)
2: L ← N IL
3: for every 2 trees Tx , Ty ∈ Children(H) from left to right do
4: Extract x, and y from Children(H)
5: T ← Merge(Tx , Ty )
6: Insert T at the beginning of L
7: H ← Children(H) ▷ H is either N IL or one tree.
8: for ∀T ∈ L from left to right do
9: H ← Merge(H, T )
10: return H
Note that L is initialized as an empty linked-list, then the algorithm iterates every two
trees in pair in the children of the K-ary tree, from left to right, and performs merging,
the result is inserted at the beginning of L. Because we insert to front end, so when we
traverse L later on, we actually process from right to left. There may be odd number of
sub-trees in H, in that case, it will leave one tree after pair-merging. We handle it by
start the right to left merging from this left tree.
Below is the ANSI C program to this algorithm.
struct node∗ pop(struct node∗ h) {
struct node ∗x, ∗y, ∗lst = NULL;
while ((x = h→children) ̸= NULL) {
if ((h→children = y = x→next) ̸= NULL)
h→children = h→children→next;
lst = push_front(lst, merge(x, y));
}
x = NULL;
while((y = lst) ̸= NULL) {
lst = lst→next;
x = merge(x, y);
}
free(h);
return x;
}

The pairing heap pop operation is conjectured to be amortized O(lg n) time [58].

Exercise 10.11
Write a program to insert a tree at the beginning of a linked-list in your favorite
imperative programming language.

Delete a node
We didn’t mention delete in Binomial heap or Fibonacci Heap. Deletion can be realized
by first decreasing key to minus infinity (−∞), then performing pop. In this section, we
present another solution for delete node.
The algorithm is to define the function delete(H, x), where x is a node in a pairing
heap H 6 .
6 Here the semantic of x is a reference to a node.
10.5. NOTES AND SHORT SUMMARY 239

If x is root, we can just perform a pop operation. Otherwise, we can cut x from H,
perform a pop on x, and then merge the pop result back to H. This can be described as
the following.
{
pop(H) : x is root of H
delete(H, x) = (10.26)
merge(cut(H, x), pop(x)) : otherwise

As delete algorithm uses pop, the performance is conjectured to be amortized O(lg n)

time.

Exercise 10.12

• Write procedural pseudo code for delete algorithm.

• Write the delete operation in your favorite imperative programming language

• Consider how to realize delete in purely functional setting.

10.5 Notes and short summary

In this chapter, we extend the heap implementation from binary tree to more generic
approach. Binomial heap and Fibonacci heap use Forest of K-ary trees as under ground
data structure, while Pairing heap use a K-ary tree to represent heap. It’s a good point to
post pone some expensive operation, so that the over all amortized performance is ensured.
Although Fibonacci Heap gives good performance in theory, the implementation is a bit
complex. It was removed in some latest textbooks. We also present pairing heap, which
is easy to realize and have good performance in practice.
The elementary tree based data structures are all introduced in this book. There are
still many tree based data structures which we can’t covers them all and skip here. We
encourage the reader to refer to other textbooks about them. From next chapter, we’ll
introduce generic sequence data structures, array and queue.
240 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Bibliography

[1] K-ary tree, Wikipedia. [Link]

[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[3] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502

[4] Wikipedia, “Pascal’s triangle”. [Link]

[5] Hackage. “An alternate implementation of a priority queue based on
a Fibonacci heap.”, [Link]
mtl/1.0.7/doc/html/src/[Link]

[6] Chris Okasaki. “Fibonacci Heaps.” [Link]

[7] Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan.
“The Pairing Heap: A New Form of Self-Adjusting Heap” Algorithmica (1986) 1:
111-129.

241
242 Queue
Chapter 11

Queue, not so simple as it was

thought

11.1 Introduction
It seems that queues are relative simple. A queue provides FIFO (first-in, first-out) data
manipulation support. There are many options to realize queue includes singly linked-list,
doubly linked-list, circular buffer etc. However, we’ll show that it’s not so easy to realize
queue in purely functional settings if it must satisfy abstract queue properties.
In this chapter, we’ll present several different approaches to implement queue. A queue
is a FIFO data structure satisfies the following performance constraints.

• Element can be added to the tail of the queue in O(1) constant time;
• Element can be removed from the head of the queue in O(1) constant time.

These two properties must be satisfied. And it’s common to add some extra goals,
such as dynamic memory allocation etc.
Of course such abstract queue interface can be implemented with doubly-linked list
trivially. But this is a overkill solution. We can even implement imperative queue with
singly linked-list or plain array. However, our main question here is about how to realize
a purely functional queue as well?
We’ll first review the typical queue solution which is realized by singly linked-list
and circular buffer in first section; Then we give a simple and straightforward functional
solution in the second section. While the performance is ensured in terms of amortized
constant time, we need find real-time solution (or worst-case solution) for some special
case. Such solution will be described in the third and the fourth section. Finally, we’ll
show a very simple real-time queue which depends on lazy evaluation.
Most of the functional contents are based on Chris, Okasaki’s great work in [3]. There
are more than 16 different types of purely functional queue given in that material.

11.2 Queue by linked-list and circular buffer

11.2.1 Singly linked-list solution
Queue can be implemented with singly linked-list. It’s easy to add and remove element
at the front end of a linked-list in O(1) time. However, in order to keep the FIFO order,
if we execute one operation on head, we must perform the inverse operation on tail.

243
244 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

In order to operate on tail, for plain singly linked-list, we must traverse the whole list
before adding or removing. Traversing is bound to O(n) time, where n is the length of
the list. This doesn’t match the abstract queue properties.
The solution is to use an extra record to store the tail of the linked-list. A sentinel
is often used to simplify the boundary handling. The following ANSI C 1 code defines a
queue realized by singly linked-list.
typedef int Key;

struct Node{
Key key;
struct Node∗ next;
};

struct Queue{
struct Node ∗head, ∗tail;
};

Figure 11.1 illustrates an empty list. Both head and tail point to the sentinel NIL
node.

head tail

Figure 11.1: The empty queue, both head and tail point to sentinel node.

We summarize the abstract queue interface as the following.

function Empty ▷ Create an empty queue
function Empty?(Q) ▷ Test if Q is empty
function Enqueue(Q, x) ▷ Add a new element x to queue Q
function Dequeue(Q) ▷ Remove element from queue Q
function Head(Q) ▷ get the next element in queue Q in FIFO order
Note the difference between Dequeue and Head. Head only retrieve next element
in FIFO order without removing it, while Dequeue performs removing.
In some programming languages, such as Haskell, and most object-oriented languages,
the above abstract queue interface can be ensured by some definition. For example, the
following Haskell code specifies the abstract queue.
class Queue q where
empty :: q a
isEmpty :: q a → Bool
push :: q a → a → q a −− Or named as ’snoc’, append, push_back
pop :: q a → q a −− Or named as ’tail’, pop_front
front :: q a → a −− Or named as ’head’

1 It’s possible to parameterize the type of the key with C++ template. ANSI C is used here for

illustration purpose.
11.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 245

To ensure the constant time Enqueue and Dequeue, we add new element to head
and remove element from tail.2
function Enqueue(Q, x)
p ← Create-New-Node
Key(p) ← x
Next(p) ← N IL
Next(Tail(Q)) ← p
Tail(Q) ← p
Note that, as we use the sentinel node, there are at least one node, the sentinel in the
queue. That’s why we needn’t check the validation of of the tail before we append the
new created node p to it.
function Dequeue(Q)
x ← Head(Q)
Next(Head(Q)) ← Next(x)
if x = Tail(Q) then ▷ Q gets empty
Tail(Q) ← Head(Q)
return Key(x)
As we always put the sentinel node in front of all the other nodes, function Head
actually returns the next node to the sentinel.
Figure 11.2 illustrates Enqueue and Dequeue process with sentinel node.
Translating the pseudo code to ANSI C program yields the below code.
struct Queue∗ enqueue(struct Queue∗ q, Key x) {
struct Node∗ p = (struct Node∗)malloc(sizeof(struct Node));
p→key = x;
p→next = NULL;
q→tail→next = p;
q→tail = p;
return q;
}

Key dequeue(struct Queue∗ q) {

struct Node∗ p = head(q); /∗gets the node next to sentinel∗/
Key x = key(p);
q→head→next = p→next;
if(q→tail == p)
q→tail = q→head;
free(p);
return x;
}

This solution is simple and robust. It’s easy to extend this solution even to the
concurrent environment (e.g. multicores). We can assign a lock to the head and use
another lock to the tail. The sentinel helps us from being dead-locked due to the empty
case [59] [60].

Exercise 11.1

• Realize the Empty? and Head algorithms for linked-list queue.

• Implement the singly linked-list queue in your favorite imperative programming

language. Note that you need provide functions to initialize and destroy the queue.

2 It’s possible to add new element to the tail, while remove element from head, but the operations are

more complex than this approach.

246 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

head tail x NIL

Enqueue

Sentinel a ... e NIL

(a) Before Enqueue x to queue

head tail

Sentinel a ... e x NIL

(b) After Enqueue x to queue

head tail

Sentinel a b ... e NIL

Dequeue

(c) Before Dequeue

head tail

Sentinel b ... e NIL

(d) After Dequeue

Figure 11.2: Enqueue and Dequeue to linked-list queue.

11.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 247

11.2.2 Circular buffer solution

Another typical solution to realize queue is to use plain array as a circular buffer (also
known as ring buffer). Oppose to linked-list, array support appending to the tail in
constant O(1) time if there are still spaces. Of course we need re-allocate spaces if the
array is fully occupied. However, Array performs poor in O(n) time when removing
element from head and packing the space. This is because we need shift all rest elements
one cell ahead. The idea of circular buffer is to reuse the free cells before the first valid
element after we remove elements from head.
The idea of circular buffer can be described in figure 11.3 and 11.4.
If we set a maximum size of the buffer instead of dynamically allocate memories, the
queue can be defined with the below ANSI C code.
struct QueueBuf{
Key∗ buf;
int head, cnt, size;
};

When initialize the queue, we are explicitly asked to provide the maximum size as the
parameter.
struct QueueBuf∗ createQ(int max){
struct QueueBuf∗ q = (struct QueueBuf∗)malloc(sizeof(struct QueueBuf));
q→buf = (Key∗)malloc(sizeof(Key)∗max);
q→size = max;
q→head = q→cnt = 0;
return q;
}

With the counter variable, we can compare it with zero and the capacity to test if the
queue is empty or full.
function Empty?(Q)
return Count(Q) = 0
To realize Enqueue and Dequeue, an easy way is to calculate the modular of index
as the following.
function Enqueue(Q, x)
if ¬ Full?(Q) then
Count(Q) ← Count(Q) + 1
tail ← (Head(Q) + Count(Q)) mod Size(Q)
Buffer(Q)[tail] ← x
function Head(Q)
if ¬ Empty?(Q) then
return Buffer(Q)[Head(Q)]
function Dequeue(Q)
if ¬ Empty?(Q) then
Head(Q) ← (Head(Q) + 1) mod Size(Q)
Count(Q) ← Count(Q) - 1
However, modular is expensive and slow depends on some settings, so one may replace
it by some adjustment. For example as in the below ANSI C program.
void enQ(struct QueueBuf∗ q, Key x){
if(!fullQ(q)){
q→buf[offset(q→head + q→cnt, q→size)] = x;
q→cnt++;
}
}
248 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

head tail boundary

a[0] a[1] ... a[i] ...

(a) Continuously add some elements.

head tail boundary

... a[j] ... a[i] ...

(b) After remove some elements from head,

there are free cells.

head tail boundary

... a[j] ... a[i]

(c) Go on adding elements till the boundary

of the array.

tail head boundary

a[0] ... a[j] ...

(d) The next element is added to the

first free cell on head.

tail head boundary

a[0] a[1] ... a[j-1] a[j] ...

(e) All cells are occupied. The queue is full.

Figure 11.3: A queue is realized with ring buffer.

11.3. PURELY FUNCTIONAL SOLUTION 249

Figure 11.4: The circular buffer.

Key headQ(struct Queue∗ q) {

return q→buf[q→head]; //¼ в»Ϊ¿ £
}

Key deQ(struct QueueBuf∗ q){

Key x = headQ(q);
q→head = offset(++q→head, q→size);
q→cnt--;
return x;
}

Exercise 11.2
The circular buffer is allocated with a maximum size parameter. Can we test the
queue is empty or full with only head and tail pointers? Note that the head can be either
before or after the tail.

11.3 Purely functional solution

11.3.1 Paired-list queue
We can’t just use a list to implement queue, or we can’t satisfy abstract queue properties.
This is because singly linked-list, which is the back-end data structure in most functional
settings, performs well on head in constant O(1) time, while it performs in linear O(n)
time on tail, where n is the length of the list. Either dequeue or enqueue will perform
proportion to the number of elements stored in the list as shown in figure 11.5.
We neither can add a pointer to record the tail position of the list as what we have
done in the imperative settings like in the ANSI C program, because of the nature of
purely functional.
Chris Okasaki mentioned a simple and straightforward functional solution in [3]. The
idea is to maintain two linked-lists as a queue, and concatenate these two lists in a tail-
to-tail manner. The shape of the queue looks like a horseshoe magnet as shown in figure
250 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

EnQueue O(1) x[n] x[n-1] ... x[2] x[1] NIL DeQueue O(n)

(a) DeQueue performs poorly.

EnQueue O(n) x[n] x[n-1] ... x[2] x[1] NIL DeQueue O(1)

(b) EnQueue performs poorly.

Figure 11.5: DeQueue and EnQueue can’t perform both in constant O(1) time with a
list.

11.6.

(a) a horseshoe magnet.

front

DeQueue O(1) x[n] x[n-1 ... x[2] x[1] NIL

EnQueue O(1) y[m] y[m-1] ... y[2] y[1] NIL

rear

(b) concatenate two lists tail-to-tail.

Figure 11.6: A queue with front and rear list shapes like a horseshoe magnet.

With this setup, we push new element to the head of the rear list, which is ensure to
be O(1) constant time; on the other hand, we pop element from the head of the front list,
which is also O(1) constant time. So that the abstract queue properties can be satisfied.
The definition of such paired-list queue can be expressed in the following Haskell code.
type Queue a = ([a], [a])

empty = ([], [])

Suppose function f ront(Q) and rear(Q) return the front and rear list in such setup,
and Queue(F, R) create a paired-list queue from two lists F and R. The EnQueue (push)
11.3. PURELY FUNCTIONAL SOLUTION 251

and DeQueue (pop) operations can be easily realized based on this setup.
push(Q, x) = Queue(f ront(Q), {x} ∪ rear(Q)) (11.1)
pop(Q) = Queue(tail(f ront(Q)), rear(Q)) (11.2)
where if a list X = {x1 , x2 , ..., xn }, function tail(X) = {x2 , x3 , ..., xn } returns the rest
of the list without the first element.
However, we must next solve the problem that after several pop operations, the front
list becomes empty, while there are still elements in rear list. One method is to rebuild
the queue by reversing the rear list, and use it to replace front list.
Hence a balance operation will be execute after popping. Let’s denote the front and
rear list of a queue Q as F = f ront(Q), and R = f ear(Q).
{
Queue(reverse(R), ϕ) : F = ϕ
balance(F, R) = (11.3)
Q : otherwise
Thus if front list isn’t empty, we do nothing, while when the front list becomes empty,
we use the reversed rear list as the new front list, and the new rear list is empty.
The new enqueue and dequeue algorithms are updated as below.
push(Q, x) = balance(F, {x} ∪ R) (11.4)
pop(Q) = balance(tail(F ), R) (11.5)
Sum up the above algorithms and translate them to Haskell yields the following pro-
gram.
balance :: Queue a → Queue a
balance ([], r) = (reverse r, [])
balance q = q

push :: Queue a → a → Queue a

push (f, r) x = balance (f, x:r)

pop :: Queue a → Queue a

pop ([], _) = error "Empty"
pop (_:f, r) = balance (f, r)

Although we only touch the heads of front list and rear list, the overall performance
can’t be kept always as O(1). Actually, the performance of this algorithm is amortized
O(1). This is because the reverse operation takes time proportion to the length of the rear
list. it’s bound O(n) time, where N = |R|. We left the prove of amortized performance
as an exercise to the reader.

11.3.2 Paired-array queue - a symmetric implementation

There is an interesting implementation which is symmetric to the paired-list queue. In
some old programming languages, such as legacy version of BASIC, There is array sup-
ported, but there is no pointers, nor records to represent linked-list. Although we can
use another array to store indexes so that we can represent linked-list with implicit array,
there is another option to realized amortized O(1) queue.
Compare the performance of array and linked-list. Below table reveals some facts
(Suppose both contain n elements).
operation Array Linked-list
insert on head O(n) O(1)
insert on tail O(1) O(n)
remove on head O(n) O(1)
remove on tail O(1) O(n)
252 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

Note that linked-list performs in constant time on head, but in linear time on tail;
while array performs in constant time on tail (suppose there is enough memory spaces,
and omit the memory reallocation for simplification), but in linear time on head. This
is because we need do shifting when prepare or eliminate an empty cell in array. (see
chapter ’the evolution of insertion sort’ for detail.)
The above table shows an interesting characteristic, that we can exploit it and provide
a solution mimic to the paired-list queue: We concatenate two arrays, head-to-head, to
make a horseshoe shape queue like in figure 11.7.

front array

x[1] x[2] ... x[n-1] x[n] EnQueue O(1)

y[1] y[2] ... y[m-1] y[m] DeQueue O(1)

rear array

(a) a horseshoe magnet. (b) concatenate two arrays head-to-

head.

Figure 11.7: A queue with front and rear arrays shapes like a horseshoe magnet.

3
We can define such paired-array queue like the following Python code
class Queue:
def __init__(self):
[Link] = []
[Link] = []

def is_empty(q):
return [Link] == [] and [Link] == []

The relative Push() and Pop() algorithm only manipulate on the tail of the arrays.
function Push(Q, x)
Append(Rear(Q), x)
Here we assume that the Append() algorithm append element x to the end of the
array, and handle the necessary memory allocation etc. Actually, there are multiple
memory handling approaches. For example, besides the dynamic re-allocation, we can
initialize the array with enough space, and just report error if it’s full.
function Pop(Q)
if Front(Q) = ϕ then
Front(Q) ← Reverse(Rear(Q))
Rear(Q) ← ϕ
n ← Length(Front(Q))
x ← Front(Q)[n]
Length(Front(Q)) ← n − 1
3 Legacy Basic code is not presented here. And we actually use list but not array in Python to illustrate

the idea. ANSI C and ISO C++ programs are provides along with this chapter, they show more in a
purely array manner.
11.4. A SMALL IMPROVEMENT, BALANCED QUEUE 253

return x
For simplification and pure illustration purpose, the array isn’t shrunk explicitly after
elements removed. So test if front array is empty (ϕ) can be realized as check if the length
of the array is zero. We omit all these details here.
The enqueue and dequeue algorithms can be translated to Python programs straight-
forwardly.
def push(q, x):
[Link](x)

def pop(q):
if [Link] == []:
[Link]()
([Link], [Link]) = ([Link], [])
return [Link]()

Similar to the paired-list queue, the performance is amortized O(1) because the reverse
procedure takes linear time.

Exercise 11.3

• Prove that the amortized performance of paired-list queue is O(1).

• Prove that the amortized performance of paired-array queue is O(1).

11.4 A small improvement, Balanced Queue

Although paired-list queue is amortized O(1) for popping and pushing, the solution we
proposed in previous section performs poor in the worst case. For example, there is one
element in the front list, and we push n elements continuously to the queue, here n is a
big number. After that executing a pop operation will cause the worst case.
According to the strategy we used so far, all the n elements are added to the rear list.
The front list turns to be empty after a pop operation. So the algorithm starts to reverse
the rear list. This reversing procedure is bound to O(n) time, which is proportion to the
length of the rear list. Sometimes, it can’t be acceptable for a very big n.
The reason why this worst case happens is because the front and rear lists are ex-
tremely unbalanced. We can improve our paired-list queue design by making them more
balanced. One option is to add a balancing constraint.

|R| ≤ |F | (11.6)

Where R = Rear(Q), F = F ront(Q), and |L| is the length of list L. This constraint
ensure the length of the rear list is less than the length of the front list. So that the
reverse procedure will be executed once the rear list grows longer than the front list.
Here we need frequently access the length information of a list. However, calculate the
length takes linear time for singly linked-list. We can record the length to a variable and
update it as adding and removing elements. This approach enables us to get the length
information in constant time.
Below example shows the modified paired-list queue definition which is augmented
with length fields.
data BalanceQueue a = BQ [a] Int [a] Int
254 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

As we keep the invariant as specified in (11.6), we can easily tell if a queue is empty
by testing the length of the front list.

F = ϕ ⇔ |F | = 0 (11.7)

In the rest part of this section, we suppose the length of a list L, can be retrieved as
|L| in constant time.
Push and pop are almost as same as before except that we check the balance invariant
by passing length information and performs reversing accordingly.

push(Q, x) = balance(F, |F |, {x} ∪ R, |R| + 1) (11.8)

pop(Q) = balance(tail(F ), |F | − 1, R, |R|) (11.9)

Where function balance() is defined as the following.

{
Queue(F, |F |, R, |R|) : |R| ≤ |F |
balance(F, |F |, R, |R|) =
Queue(F ∪ reverse(R), |F | + |R|, ϕ, 0) : otherwise
(11.10)
Note that the function Queue() takes four parameters, the front list along with its
length (recorded), and the rear list along with its length, and forms a paired-list queue
augmented with length fields.
We can easily translate the equations to Haskell program. And we can enforce the
abstract queue interface by making the implementation an instance of the Queue type
class.
instance Queue BalanceQueue where
empty = BQ [] 0 [] 0

isEmpty (BQ _ lenf _ _) = lenf == 0

−− Amortized O(1) time push

push (BQ f lenf r lenr) x = balance f lenf (x:r) (lenr + 1)

−− Amortized O(1) time pop

pop (BQ (_:f) lenf r lenr) = balance f (lenf - 1) r lenr

front (BQ (x:_) _ _ _) = x

balance f lenf r lenr

| lenr ≤ lenf = BQ f lenf r lenr
| otherwise = BQ (f ++ (reverse r)) (lenf + lenr) [] 0

Exercise 11.4
Write the symmetric balance improvement solution for paired-array queue in your
favorite imperative programming language.

11.5 One more step improvement, Real-time Queue

Although the extremely worst case can be avoided by improving the balancing as what
has been presented in previous section, the performance of reversing rear list is still bound
to O(n), where N = |R|. So if the rear list is very long, the instant performance is still
unacceptable poor even if the amortized time is O(1). It is particularly important in some
real-time system to ensure the worst case performance.
11.5. ONE MORE STEP IMPROVEMENT, REAL-TIME QUEUE 255

As we have analyzed, the bottleneck is the computation of F ∪ reverse(R). This

happens when |R| > |F |. Considering that |F | and |R| are all integers, so this computation
happens when

|R| = |F | + 1 (11.11)

Both F and the result of reverse(R) are singly linked-list, It takes O(|F |) time to
concatenate them together, and it takes extra O(|R|) time to reverse the rear list, so the
total computation is bound to O(|N |), where N = |F | + |R|. Which is proportion to the
total number of elements in the queue.
In order to realize a real-time queue, we can’t computing F ∪ reverse(R) monolithic.
Our strategy is to distribute this expensive computation to every pop and push operations.
Thus although each pop and push get a bit slow, we may avoid the extremely slow worst
pop or push case.

Incremental reverse
Let’s examine how functional reverse algorithm is implemented typically.
{
ϕ : X=ϕ
reverse(X) = (11.12)
reverse(X ′ ) ∪ {x1 } : otherwise

Where X ′ = tail(X) = {x2 , x3 , ...}.

This is a typical recursive algorithm, that if the list to be reversed is empty, the result
is just an empty list. This is the edge case; otherwise, we take the first element x1 from
the list, reverse the rest {x2 , x3 , ..., xn }, to {xn , xn−1 , .., x3 , x2 } and append x1 after it.
However, this algorithm performs poor, as appending an element to the end of a list is
proportion to the length of the list. So it’s O(N 2 ), but not a linear time reverse algorithm.
There exists another implementation which utilizes an accumulator A, like below.

reverse(X) = reverse′ (X, ϕ) (11.13)

Where
{
′ A : X=ϕ
reverse (X, A) = (11.14)
reverse′ (X ′ , {x1 } ∪ A) : otherwise

We call A as accumulator because it accumulates intermediate reverse result at any

time. Every time we call reverse′ (X, A), list X contains the rest of elements wait to
be reversed, and A holds all the reversed elements so far. For instance when we call
reverse′ () at i-th time, X and A contains the following elements:

X = {xi , xi+1 , ..., xn } A = {xi−1 , xi−2 , ...x1 }

In every non-trivial case, we takes the first element from X in O(1) time; then put it
in front of the accumulator A, which is again O(1) constant time. We repeat it n times,
so this is a linear time (O(n)) algorithm.
The latter version of reverse is obviously a tail-recursion algorithm, see [5] and [62]
for detail. Such characteristic is easy to change from monolithic algorithm to incremental
manner.
The solution is state transferring. We can use a state machine contains two types of
stat: reversing state Sr to indicate that the reverse is still on-going (not finished), and
finish state Sf to indicate the reverse has been done (finished). In Haskell programming
language, it can be defined as a type.
256 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

data State a = | Reverse [a] [a]

| Done [a]

And we can schedule (slow-down) the above reverse′ (X, A) function with these two
types of state.
{
(Sf , A) : S = Sr ∧ X = ϕ
step(S, X, A) = (11.15)
(Sr , X ′ , {x1 } ∪ A) : S = Sr ∧ X 6= ϕ

Each step, we examine the state type first, if the current state is Sr (on-going), and
the rest elements to be reversed in X is empty, we can turn the algorithm to finish state
Sf ; otherwise, we take the first element from X, put it in front of A just as same as above,
but we do NOT perform recursion, instead, we just finish this step. We can store the
current state as well as the resulted X and A, the reverse can be continued at any time
when we call ’next’ step function in the future with the stored state, X and A passed in.
Here is an example of this step-by-step reverse algorithm.

step(Sr , “hello”, ϕ) = (Sr , “ello”, “h”)

step(Sr , “ello”, “h”) = (Sr , “llo”, “eh”)
...
step(Sr , “o”, “lleh”) = (Sr , ϕ, “olleh”)
step(Sr , ϕ, “olleh”) = (Sf , “olleh”)

And in Haskell code manner, the example is like the following.

step $ Reverse "hello" [] = Reverse "ello" "h"
step $ Reverse "ello" "h" = Reverse "llo" "eh"
...
step $ Reverse "o" "lleh" = Reverse [] "olleh"
step $ Reverse [] "olleh" = Done "olleh"

Now we can distribute the reverse into steps in every pop and push operations. How-
ever, the problem is just half solved. We want to break down F ∪reverse(R), and we have
broken reverse(R) into steps, we next need to schedule(slow-down) the list concatenation
part F ∪ ..., which is bound to O(|F |), into incremental manner so that we can distribute
it to pop and push operations.

Incremental concatenate
It’s a bit more challenge to implement incremental list concatenation than list reversing.
However, it’s possible to re-use the result we gained from increment reverse by a small
←
−
trick: In order to realize X ∪ Y , we can first reverse X to X , then take elements one by
←
−
one from X and put them in front of Y just as what we have done in reverse′ .

X ∪Y ≡ reverse(reverse(X)) ∪ Y
≡ reverse′ (reverse(X), ϕ) ∪ Y
≡ reverse′ (reverse(X), Y ) (11.16)
←−
≡ reverse′ ( X , Y )

This fact indicates us that we can use an extra state to instruct the step() function
←
−
to continuously concatenating F after R is reversed.
The strategy is to do the total work in two phases:
←
− ←
−
1. Reverse both F and R in parallel to get F = reverse(F ), and R = reverse(R)
incrementally;
11.5. ONE MORE STEP IMPROVEMENT, REAL-TIME QUEUE 257

←
− ←
−
2. Incrementally take elements from F and put them in front of R .

So we define three types of state: Sr represents reversing; Sc represents concatenating;

and Sf represents finish.
In Haskell, these types of state are defined as the following.
data State a = Reverse [a] [a] [a] [a]
| Concat [a] [a]
| Done [a]

Because we reverse F and R simultaneously, so reversing state takes two pairs of lists
and accumulators.
The state transferring is defined according to the two phases strategy described pre-
viously. Denotes that F = {f1 , f2 , ...}, F ′ = tail(F ) = {f2 , f3 , ...}, R = {r1 , r2 , ...},
R′ = tail(R) = {r2 , r3 , ...}. A state S, contains it’s type S, which has the value among
←
−
Sr , Sc , and Sf . Note that S also contains necessary parameters such as F , F , X, A etc
as intermediate results. These parameters vary according to the different states.
 ←
− ←−

 (Sr , F ′ , {f1 } ∪ F , R′ , {r1 } ∪ R ) : S = Sr ∧ F 6= ϕ ∧ R 6= ϕ
 ←
− ←−
next(S) = (Sc , F , {r1 } ∪ R ) : S = Sr ∧ F = ϕ ∧ R = {r1 }

 (Sf , A) : S = Sc ∧ X = ϕ

(Sc , X ′ , {x1 } ∪ A) : S = Sc ∧ X 6= ϕ
(11.17)
The relative Haskell program is list as below.
next (Reverse (x:f) f' (y:r) r') = Reverse f (x:f') r (y:r')
next (Reverse [] f' [y] r') = Concat f' (y:r')
next (Concat 0 _ acc) = Done acc
next (Concat (x:f') acc) = Concat f' (x:acc)

All left to us is to distribute these incremental steps into every pop and push operations
to implement a real-time O(1) purely functional queue.

Sum up
Before we dive into the final real-time queue implementation, let’s analyze how many
incremental steps are taken to achieve the result of F ∪ reverse(R). According to the
balance variant we used previously, |R| = |F | + 1, Let’s denotes m = |F |.
Once the queue gets unbalanced due to some push or pop operation, we start this
incremental F ∪ reverse(R). It needs m + 1 steps to reverse R, and at the same time, we
finish reversing the list F within these steps. After that, we need extra m + 1 steps to
execute the concatenation. So there are 2m + 2 steps.
It seems that distribute one step inside one pop or push operation is the natural
solution. However, there is a critical question must be answered: Is it possible that
before we finish these 2m + 2 steps, the queue gets unbalanced again due to a series push
and pop?
There are two facts about this question, one is good news and the other is bad news.
Let’s first show the good news, that luckily, continuously pushing can’t make the queue
unbalanced again before we finish these 2m + 2 steps to achieve F ∪ reverse(R). This is
because once we start re-balancing, we can get a new front list F ′ = F ∪ reverse(R) after
2m + 2 steps. While the next time unbalance is triggered when

|R′ | = |F ′ | + 1
= |F | + |R| + 1 (11.18)
= 2m + 2
258 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

front copy on-going computation new rear

{fi , fi+1 , ..., fM } (Sr , F̃ , ..., R̃, ...) {...}
←
− ←
−
first i − 1 elements popped intermediate result F and R new elements pushed

Table 11.1: Intermediate state of a queue before first m steps finish.

That is to say, even we continuously pushing as mush elements as possible after the
last unbalanced time, when the queue gets unbalanced again, the 2m + 2 steps exactly
get finished at that time point. Which means the new front list F ′ is calculated OK. We
can safely go on to compute F ′ ∪ reverse(R′ ). Thanks to the balance invariant which is
designed in previous section.
But, the bad news is that, pop operation can happen at anytime before these 2m + 2
steps finish. The situation is that once we want to extract element from front list, the
new front list F ′ = F ∪ reverse(R) hasn’t been ready yet. We don’t have a valid front
list at hand.
One solution to this problem is to keep a copy of original front list F , during the
time we are calculating reverse(F ) which is described in phase 1 of our incremental
computing strategy. So that we are still safe even if user continuously performs first m
pop operations. So the queue looks like in table 11.1 at some time after we start the
incremental computation and before phase 1 (reverse F and R simultaneously) ending4 .
After these M pop operations, the copy of F is exhausted. And we just start incre-
mental concatenation phase at that time. What if user goes on popping?
The fact is that since F is exhausted (becomes ϕ), we needn’t do concatenation at all.
←
− ←
− ← −
Since F ∪ R = ϕ ∪ R = R .
It indicates us, when doing concatenation, we only need to concatenate those elements
haven’t been popped, which are still left in F . As user pops elements one by one contin-
uously from the head of front list F , one method is to use a counter, record how many
elements there are still in F . The counter is initialized as 0 when we start computing
F ∪ reverse(R), it’s increased by one when we reverse one element in F , which means we
need concatenate this element in the future; and it’s decreased by one every time when
pop is performed, which means we can concatenate one element less; of course we need
decrease this counter as well in every steps of concatenation. If and only if this counter
becomes zero, we needn’t do concatenations any more.
We can give the realization of purely functional real-time queue according to the above
analysis.
We first add an idle state S0 to simplify some state transferring. Below Haskell
program is an example of this modified state definition.
data State a = Empty
| Reverse Int [a] [a] [a] [a] −− n, f’, acc_f’ r, acc_r
| Append Int [a] [a] −− n, rev_f’, acc
| Done [a] −− result: f ++ reverse r

And the data structure is defined with three parts, the front list (augmented with
length); the on-going state of computing F ∪ reverse(R); and the rear list (augmented
with length).
4 One may wonder that copying a list takes linear time to the length of the list. If so the whole solution

would make no sense. Actually, this linear time copying won’t happen at all. This is because the purely
functional nature, the front list won’t be mutated either by popping or by reversing. However, if trying to
realize a symmetric solution with paired-array and mutate the array in-place, this issue should be stated,
and we can perform a ‘lazy’ copying, that the real copying work won’t execute immediately, instead, it
copies one element every step we do incremental reversing. The detailed implementation is left as an
exercise.
11.5. ONE MORE STEP IMPROVEMENT, REAL-TIME QUEUE 259

Here is the Haskell definition of real-time queue.

data RealtimeQueue a = RTQ [a] Int (State a) [a] Int

The empty queue is composed with empty front and rear list together with idle state
S0 as Queue(ϕ, 0, S0 , ϕ, 0). And we can test if a queue is empty by checking if |F | = 0
according to the balance invariant defined before. Push and pop are changed accordingly.

push(Q, x) = balance(F, |F |, S, {x} ∪ R, |R| + 1) (11.19)

pop(Q) = balance(F ′ , |F | − 1, abort(S), R, |R|) (11.20)

The major difference is abort() function. Based on our above analysis, when there is
popping, we need decrease the counter, so that we can concatenate one element less. We
define this as aborting. The details will be given after balance() function.
The relative Haskell code for push and pop are listed like this.
push (RTQ f lenf s r lenr) x = balance f lenf s (x:r) (lenr + 1)
pop (RTQ (_:f) lenf s r lenr) = balance f (lenf - 1) (abort s) r lenr

The balance() function first check the balance invariant, if it’s violated, we need start
re-balance it by starting compute F ∪reverse(R) incrementally; otherwise we just execute
one step of the unfinished incremental computation.
{
step(F, |F |, S, R, |R|) : |R| ≤ |F |
balance(F, |F |, S, R, |R|) =
step(F, |F | + |R|, (Sr , 0, F, ϕ, R, ϕ)ϕ, 0) : otherwise
(11.21)
The relative Haskell code is given like below.
balance f lenf s r lenr
| lenr ≤ lenf = step f lenf s r lenr
| otherwise = step f (lenf + lenr) (Reverse 0 f [] r []) [] 0

The step() function typically transfer the state machine one state ahead, and it will
turn the state to idle (S0 ) when the incremental computation finishes.
{
Queue(F ′ , |F |, S0 , R, |R|) : S ′ = Sf
step(F, |F |, S, R, |R|) = (11.22)
Queue(F, |F |, S ′ , R, |R|) : otherwise

Where S ′ = next(S) is the next state transferred; F ′ = F ∪ reverse(R), is the

final new front list result from the incremental computing. The real state transferring is
implemented in next() function as the following. It’s different from previous version by
adding the counter field n to record how many elements left we need to concatenate.
 ←− ←−

 (Sr , n + 1, F ′ , {f1 } ∪ F , R′ , {r1 } ∪ R ) : S = Sr ∧ F 6= ϕ

 ←
− ←−
 (Sc , n, F , {r1 } ∪ R ) : S = Sr ∧ F = ϕ
next(S) = (Sf , A) : S = Sc ∧ n = 0 (11.23)



 (S , n − 1, X ′
, {x } ∪ A) : S = Sc ∧ n 6= 0
 c 1
S : otherwise

And the corresponding Haskell code is like this.

next (Reverse n (x:f) f' (y:r) r') = Reverse (n+1) f (x:f') r (y:r')
next (Reverse n [] f' [y] r') = Concat n f' (y:r')
next (Concat 0 _ acc) = Done acc
next (Concat n (x:f') acc) = Concat (n-1) f' (x:acc)
next s = s
260 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

Function abort() is used to tell the state machine, we can concatenate one element
less since it is popped.


 (Sf , A′ ) : S = Sc ∧ n = 0
 (Sc , n − 1, X ′ A) : S = Sc ∧ n 6= 0
abort(S) = ←− ←
− (11.24)

 (Sr , n − 1, F, F , R, R ) : S = Sr

S : otherwise

Note that when n = 0 we actually rollback one concatenated element by return A′ as

the result but not A. (Why? this is left as an exercise.)
The Haskell code for abort function is like the following.
abort (Concat 0 _ (_:acc)) = Done acc −− Note! we rollback 1 elem
abort (Concat n f' acc) = Concat (n-1) f' acc
abort (Reverse n f f' r r') = Reverse (n-1) f f' r r'
abort s = s

It seems that we’ve done, however, there is still one tricky issue hidden behind us. If
we push an element x to an empty queue, the result queue will be:

Queue(ϕ, 1, (Sc , 0, ϕ, {x}), ϕ, 0)

If we perform pop immediately, we’ll get an error! We found that the front list is
empty although the previous computation of F ∪ reverse(R) has been finished. This is
because it takes one more extra step to transfer from the state (Sc , 0, ϕ, A) to (Sf , A).
It’s necessary to refine the S ′ in step() function a bit.
{
next(next(S)) : F = ϕ
S′ = (11.25)
next(S) : otherwise

The modification reflects to the below Haskell code:

step f lenf s r lenr =
case s' of
Done f' → RTQ f' lenf Empty r lenr
s' → RTQ f lenf s' r lenr
where s' = if null f then next $ next s else next s

Note that this algorithm differs from the one given by Chris Okasaki in [3]. Okasaki’s
algorithm executes two steps per pop and push, while the one presents in this chapter
executes only one per pop and push, which leads to more distributed performance.

Exercise 11.5

• Why need we rollback one element when n = 0 in abort() function?

• Realize the real-time queue with symmetric paired-array queue solution in your
favorite imperative programming language.

• In the footnote, we mentioned that when we start incremental reversing with in-
place paired-array solution, copying the array can’t be done monolithic or it will
lead to linear time operation. Implement the lazy copying so that we copy one
element per step along with the reversing.
11.6. LAZY REAL-TIME QUEUE 261

11.6 Lazy real-time queue

The key to realize a real-time queue is to break down the expensive F ∪ reverse(R) to
avoid monolithic computation. Lazy evaluation is particularly helpful in such case. In
this section, we’ll explore if there is some more elegant solution by exploit laziness.
Suppose that there exits a function rotate(), which can compute F ∪ reverse(R)
incrementally. that’s to say, with some accumulator A, the following two functions are
equivalent.

rotate(X, Y, A) ≡ X ∪ reverse(Y ) ∪ A (11.26)

Where we initialized X as the front list F , Y as the rear list R, and the accumulator
A is initialized as empty ϕ.
The trigger of rotation is still as same as before when |F | + 1 = |R|. Let’s keep this
constraint as an invariant during the whole rotation process, that |X| + 1 = |Y | always
holds.
It’s obvious to deduce to the trivial case:

rotate(ϕ, {y1 }, A) = {y1 } ∪ A (11.27)

Denote X = {x1 , x2 , ...}, Y = {y1 , y2 , ...}, and X ′ = {x2 , x3 , ...}, Y ′ = {y2 , y3 , ...} are
the rest of the lists without the first element for X and Y respectively. The recursion
case is ruled out as the following.

rotate(X, Y, A) ≡ X ∪ reverse(Y ) ∪ A Definition of (11.26)

≡ {x1 } ∪ (X ′ ∪ reverse(Y ) ∪ A) Associative of ∪
≡ {x1 } ∪ (X ′ ∪ reverse(Y ′ ) ∪ ({y1 } ∪ A))
Nature of reverse and associative of ∪
≡ {x1 } ∪ rotate(X ′ , Y ′ , {y1 } ∪ A) Definition of (11.26)
(11.28)
Summarize the above two cases, yields the final incremental rotate algorithm.
{
{y1 } ∪ A : X = ϕ
rotate(X, Y, A) = (11.29)
{x1 } ∪ rotate(X ′ , Y ′ , {y1 } ∪ A) : otherwise

If we execute ∪ lazily instead of strictly, that is, execute ∪ once pop or push operation
is performed, the computation of rotate can be distribute to push and pop naturally.
Based on this idea, we modify the paired-list queue definition to change the front list
to a lazy list, and augment it with a computation stream. [63]. When the queue triggers
re-balance constraint by some pop/push, that |F | + 1 = |R|, The algorithm creates a lazy
rotation computation, then use this lazy rotation as the new front list F ′ ; the new rear
list becomes ϕ, and a copy of F ′ is maintained as a stream.
After that, when we performs every push and pop; we consume the stream by forcing
a ∪ operation. This results us advancing one step along the stream, {x} ∪ F ′′ , where
F ′′ = tail(F ′ ). We can discard x, and replace the stream F ′ with F ′′ .
Once all of the stream is exhausted, we can start another rotation.
In order to illustrate this idea clearly, we turns to Scheme/Lisp programming language
to show example codes, because it gives us explicit control of laziness.
In Scheme/Lisp, we have the following three tools to deal with lazy stream.
(define (cons-stream a b) (cons a (delay b)))

(define stream-car car)

(define (stream-cdr s) (cdr (force s)))

262 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

Function cons-stream constructs a ’lazy’ list from an element x and an existing

list L without really evaluating the value of L; The evaluation is actually delayed to
stream-cdr, where the computation is forced. delaying can be realized by lambda
calculus as described in[63].
The lazy paired-list queue is defined as the following.
(define (make-queue f r s)
(list f r s))

; ; Auxiliary functions
(define (front-lst q) (car q))

(define (rear-lst q) (cadr q))

(define (rots q) (caddr q))

A queue is consist of three parts, a front list, a rear list, and a stream which represents
the computation of F ∪ reverse(R). Create an empty queue is trivial as making all these
three parts null.
(define empty (make-queue '() '() '()))

Note that the front-list is also lazy stream actually, so we need use stream related
functions to manipulate it. For example, the following function test if the queue is empty
by checking the front lazy list stream.
(define (empty? q) (stream-null? (front-lst q)))

The push function is almost as same as the one given in previous section. That we
put the new element in front of the rear list; and then examine the balance invariant and
do necessary balancing works.
push(Q, x) = balance(F, {x} ∪ R, Rs ) (11.30)
Where F represents the lazy stream of front list; Rs is the stream of rotation compu-
tation. The relative Scheme/Lisp code is give below.
(define (push q x)
(balance (front-lst q) (cons x (rear q)) (rots q)))

While pop is a bit different, because the front list is actually lazy stream, we need
force an evaluation. All the others are as same as before.
pop(Q) = balance(F ′ , R, Rs ) (11.31)
Here F ′ , force one evaluation to F, the Scheme/Lisp code regarding to this equation
is as the following.
(define (pop q)
(balance (stream-cdr (front-lst q)) (rear q) (rots q)))

For illustration purpose, we skip the error handling (such as pop from an empty queue
etc) here.
And one can access the top element in the queue by extract from the front list stream.
(define (front q) (stream-car (front-lst q)))

The balance function first checks if the computation stream is completely exhausted,
and starts new rotation accordingly; otherwise, it just consumes one evaluation by en-
forcing the lazy stream.
{
Queue(F ′ , ϕ, F ′ ) : Rs = ϕ
balance(Q) = (11.32)
Queue(F, R, R′s ) : otherwise
11.7. NOTES AND SHORT SUMMARY 263

Here F ′ is defined to start a new rotation.

F ′ = rotate(F, R, ϕ) (11.33)
The relative Scheme/Lisp program is listed accordingly.
(define (balance f r s)
(if (stream-null? s)
(let ((newf (rotate f r '())))
(make-queue newf '() newf))
(make-queue f r (stream-cdr s))))

The implementation of incremental rotate function is just as same as what we analyzed

above.
(define (rotate xs ys acc)
(if (stream-null? xs)
(cons-stream (car ys) acc)
(cons-stream (stream-car xs)
(rotate (stream-cdr xs) (cdr ys)
(cons-stream (car ys) acc)))))

We used explicit lazy evaluation in Scheme/Lisp. Actually, this program can be very
short by using lazy programming languages, for example, Haskell.
data LazyRTQueue a = LQ [a] [a] [a] −− front, rear, f ++ reverse r
instance Queue LazyRTQueue where
empty = LQ [] [] []

isEmpty (LQ f _ _) = null f

−− O(1) time push

push (LQ f r rot) x = balance f (x:r) rot

−− O(1) time pop

pop (LQ (_:f) r rot) = balance f r rot

front (LQ (x:_) _ _) = x

balance f r [] = let f' = rotate f r [] in LQ f' [] f'

balance f r (_:rot) = LQ f r rot

rotate [] [y] acc = y:acc

rotate (x:xs) (y:ys) acc = x : rotate xs ys (y:acc)

11.7 Notes and short summary

Just as mentioned in the beginning of this book in the first chapter, queue isn’t so simple as
it was thought. We’ve tries to explain algorithms and data structures both in imperative
and in function approaches; Sometimes, it gives impression that functional way is simpler
and more expressive in most time. However, there are still plenty of areas, that more
studies and works are needed to give equivalent functional solution. Queue is such an
important topic, that it links to many fundamental purely functional data structures.
That’s why Chris Okasaki made intensively study and took a great amount of discus-
sions in [3]. With purely functional queue solved, we can easily implement dequeue with
the similar approach revealed in this chapter. As we can handle elements effectively in
both head and tail, we can advance one step ahead to realize sequence data structures,
which support fast concatenate, and finally we can realize random access data structures
to mimic array in imperative settings. The details will be explained in later chapters.
264 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

Note that, although we haven’t mentioned priority queue, it’s quite possible to realized
it with heaps. We have covered topic of heaps in several previous chapters.

Exercise 11.6

• Realize dequeue, which support adding and removing elements on both sides in
constant O(1) time in purely functional way.

• Realize dequeue in a symmetric solution only with array in your favorite imperative
programming language.
Bibliography

[1] Maged M. Michael and Michael L. Scott. “Simple, Fast, and Prac-
tical Non-Blocking and Blocking Concurrent Queue Algorithms”.
[Link]
[2] Herb Sutter. “Writing a Generalized Concurrent Queue”. Dr. Dobb’s Oct 29, 2008.
[Link]

[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[4] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[5] Wikipedia. “Tail-call”. [Link]

[6] Wikipedia. “Recursion (computer science)”. [Link]

recursive_functions
[7] Harold Abelson, Gerald Jay Sussman, Julie Sussman. “Structure and Interpretation
of Computer Programs, 2nd Edition”. MIT Press, 1996, ISBN 0-262-51087-1

265
266 Sequences
Chapter 12

Sequences, The last brick

12.1 Introduction
In the first chapter of this book, which introduced binary search tree as the ‘hello world’
data structure, we mentioned that neither queue nor array is simple if realized not only
in imperative way, but also in functional approach. In previous chapter, we explained
functional queue, which achieves the similar performance as its imperative counterpart.
In this chapter, we’ll dive into the topic of array-like data structures.
We have introduced several data structures in this book so far, and it seems that func-
tional approaches typically bring more expressive and elegant solution. However, there
are some areas, people haven’t found competitive purely functional solutions which can
match the imperative ones. For instance, the Ukkonen linear time suﬀix tree construction
algorithm. another examples is Hashing table. Array is also among them.
Array is trivial in imperative settings, it enables randomly accessing any elements with
index in constant O(1) time. However, this performance target can’t be achieved directly
in purely functional settings as there is only list can be used.
In this chapter, we are going to abstract the concept of array to sequences. Which
support the following features

• Element can be inserted to or removed from the head of the sequence quickly in
O(1) time;

• Element can be inserted to or removed from the tail of the sequence quickly in O(1)
time;

• Support concatenate two sequences quickly (faster than linear time);

• Support randomly access and update any element quickly;

• Support split at any position quickly;

We call these features abstract sequence properties, and it easy to see the fact that
even array (here means plain-array) in imperative settings can’t meet them all at the
same time.
We’ll provide three solutions in this chapter. Firstly, we’ll introduce a solution based
on binary tree forest and numeric representation; Secondly, we’ll show a concatenate-able
list solution; Finally, we’ll give the finger tree solution.
Most of the results are based on Chris, Okasaki’s work in [3].

267
268 CHAPTER 12. SEQUENCES, THE LAST BRICK

12.2 Binary random access list

12.2.1 Review of plain-array and list
Let’s review the performance of plain-array and singly linked-list so that we know how
they perform in different cases.
operation Array Linked-list
operation on head O(n) O(1)
operation on tail O(1) O(n)
access at random position O(1) average O(n)
remove at given position average O(n) O(1)
concatenate O(n2 ) O(n1 )
Because we hold the head of linked list, operations on head such as insert and remove
perform in constant time; while we need traverse to the end to perform removing or
appending on tail; Given a position i, it need traverse i elements to access it. Once
we are at that position, removing element from there is just bound to constant time by
modifying some pointers. In order to concatenate two linked-lists, we need traverse to
the end of the first one, and link it to the second one, which is bound to the length of the
first linked-list;
On the other hand, for array, we must prepare free cell for inserting a new element to
the head of it, and we need release the first cell after the first element being removed, all
these two operations are achieved by shifting all the rest elements forward or backward,
which costs linear time. While the operations on the tail of array are trivial constant
time. Array also support accessing random position i by nature; However, removing the
element at that position causes shifting all elements after it one position ahead. In order
to concatenate two arrays, we need copy all elements from the second one to the end of
the first one (ignore the memory re-allocation details), which is proportion to the length
of the second array.
In the chapter about binomial heaps, we have explained the idea of using forest, which
is a list of trees. It brings us the merit that, for any given number n, by representing it in
binary number, we know how many binomial trees need to hold them. That each bit of 1
represents a binomial tree of that rank of bit. We can go one step ahead, if we have a n
nodes binomial heap, for any given index 1 < i < n, we can quickly know which binomial
tree in the heap holds the i-th node.

12.2.2 Represent sequence by trees

One solution to realize a random-access sequence is to manage the sequence with a forest
of complete binary trees. Figure 12.1 shows how we attach such trees to a sequence of
numbers.
Here two trees t1 and t2 are used to represent sequence {x1 , x2 , x3 , x4 , x5 , x6 }. The
size of binary tree t1 is 2. The first two elements {x1 , x2 } are leaves of t1 ; the size of
binary tree t2 is 4. The next four elements {x3 , x4 , x5 , x6 } are leaves of t2 .
For a complete binary tree, we define the depth as 0 if the tree has only a leaf. The
tree is denoted as as ti if its depth is i + 1. It’s obvious that there are 2i leaves in ti .
For any sequence contains n elements, it can be turned to a forest of complete binary
trees in this manner. First we represent n in binary number like below.

n = 20 e0 + 21 e1 + ... + 2m em (12.1)

Where ei is either 1 or 0, so n = (em em−1 ...e1 e0 )2 . If ei 6= 0, we then need a complete

binary tree with size 2i . For example in figure 12.1, as the length of sequence is 6, which
is (110)2 in binary. The lowest bit is 0, so we needn’t a tree of size 1; the second bit is 1,
12.2. BINARY RANDOM ACCESS LIST 269

x1 x2 x3 x4 x5 x6

Figure 12.1: A sequence of 6 elements can be represented in a forest.

so we need a tree of size 2, which has depth of 2; the highest bit is also 1, thus we need a
tree of size 4, which has depth of 3.
This method represents the sequence {x1 , x2 , ..., xn } to a list of trees {t0 , t1 , ..., tm }
where ti is either empty if ei = 0 or a complete binary tree if ei = 1. We call this
representation as Binary Random Access List [3].
We can reused the definition of binary tree. For example, the following Haskell pro-
gram defines the tree and the binary random access list.
data Tree a = Leaf a
| Node Int (Tree a) (Tree a) −− size, left, right
type BRAList a = [Tree a]

The only difference from the typical binary tree is that we augment the size information
to the tree. This enable us to get the size without calculation at every time. For instance.
size (Leaf _) = 1
size (Node sz _ _) = sz

12.2.3 Insertion to the head of the sequence

The new forest representation of sequence enables many operation effectively. For exam-
ple, the operation of inserting a new element y in front of sequence can be realized as the
following.

1. Create a tree t′ , with y as the only one leaf;

2. Examine the first tree in the forest, compare its size with t′ , if its size is greater
than t′ , we just let t′ be the new head of the forest, since the forest is a linked-list of
tree, insert t′ to its head is trivial operation, which is bound to constant O(1) time;

3. Otherwise, if the size of first tree in the forest is equal to t′ , let’s denote this tree in
the forest as ti , we can construct a new binary tree t′i+1 by linking ti and t′ as its
left and right children. After that, we recursively try to insert t′i+1 to the forest.
270 CHAPTER 12. SEQUENCES, THE LAST BRICK

(a) A singleton leaf of x1

x2 x1

(b) Insert x2 . It causes linking, results a tree of

height 1.
t2

x3 x2 x1 x4 x3 x2 x1

(c) Insert x3 . the result is two trees, t1 and t2 (d) Insert x4 . It first causes
linking two leafs to a binary
tree, then it performs link-
ing again, which results a
final tree of height 2.

Figure 12.2: Steps of inserting elements to an empty list, 1

t2 t2

x5 x4 x3 x2 x1 x6 x5 x4 x3 x2 x1

(a) Insert x5 . The forest is a leaf (b) Insert x6 . It links two leaf to t1 .
(t0 ) and t2 .

Figure 12.3: Steps of inserting elements to an empty list, 2

12.2. BINARY RANDOM ACCESS LIST 271

Figure 12.2 and 12.3 illustrate the steps of inserting elements x1 , x2 , ..., x6 to an empty
forest.
As there are at most M trees in the forest, and m is bound to O(lg n), so the insertion
to head algorithm is ensured to perform in O(lg n) even in worst case. We’ll prove the
amortized performance is O(1) later.
Let’s formalize the algorithm. we define the function of inserting an element in front
of a sequence as insert(S, x).
insert(S, x) = insertT ree(S, leaf (x)) (12.2)
This function just wrap element x to a singleton tree with a leaf, and call insertT ree
to insert this tree to the forest. Suppose the forest F = {t1 , t2 , ...} if it’s not empty, and
F ′ = {t2 , t3 , ...} is the rest of trees without the first one.

 {t} : F = ϕ
insertT ree(F, t) = {t} ∪ F : size(t) < size(t1 ) (12.3)

insertT ree(F ′ , link(t, t1 )) : otherwise
Where function link(t1 , t2 ) create a new tree from two small trees with same size.
Suppose function tree(s, t1 , t2 ) create a tree, set its size as s, makes t1 as the left child,
and t2 as the right child, linking can be realized as below.
link(t1 , t2 ) = tree(size(t1 ) + size(t2 ), t1 , t2 ) (12.4)
The relative Haskell programs can be given by translating these definitions.
cons :: a → BRAList a → BRAList a
cons x ts = insertTree ts (Leaf x)

insertTree :: BRAList a → Tree a → BRAList a

insertTree [] t = [t]
insertTree (t':ts) t = if size t < size t' then t:t':ts
else insertTree ts (link t t')

−− Precondition: rank t1 = rank t2

link :: Tree a → Tree a → Tree a
link t1 t2 = Node (size t1 + size t2) t1 t2

Here we use the Lisp tradition to name the function that insert an element before a
list as ‘cons’.

Remove the element from the head of the sequence

It’s not complex to realize the inverse operation of ‘cons’, which can remove element from
the head of the sequence.
• If the first tree in the forest is a singleton leaf, remove this tree from the forest;
• otherwise, we can halve the first tree by unlinking its two children, so the first tree
in the forest becomes two trees, we recursively halve the first tree until it turns to
be a leaf.
Figure 12.4 illustrates the steps of removing elements from the head of the sequence.
If we assume the sequence isn’t empty, so that we can skip the error handling such
as trying to remove an element from an empty sequence, this can be expressed with the
following definition. We denote the forest F = {t1 , t2 , ...} and the trees without the first
one as F ′ = {t2 , t3 , ...}
{
(t1 , F ′ ) : t1 is leaf
extractT ree(F ) = (12.5)
extractT ree({tl , tr } ∪ F ′ ) : otherwise
272 CHAPTER 12. SEQUENCES, THE LAST BRICK

t2 t2

x5 x4 x3 x2 x1 x4 x3 x2 x1

(a) A sequence of 5 elements (b) Result of removing x5 ,

the leaf is removed.
t1

x3 x2 x1

(c) Result of removing x4 , As there is not leaf tree, the tree

is firstly divided into two sub trees of size 2. The first tree
is next divided again into two leafs, after that, the first leaf,
which contains x4 is removed. What left in the forest is a
leaf tree of x3 , and a tree of size 2 with elements x2 , x1 .

Figure 12.4: Steps of removing elements from head

where {tl , tr } = unlink(t1 ) are the two children of t1 .

It can be translated to Haskell programs like below.
extractTree (t@(Leaf x):ts) = (t, ts)
extractTree (t@(Node _ t1 t2):ts) = extractTree (t1:t2:ts)

With this function defined, it’s convenient to give head and tail functions, the former
returns the first element in the sequence, the latter return the rest.

head(S) = key(f irst(extractT ree(S))) (12.6)

tail(S) = second(extractT ree(S)) (12.7)

Where function f irst returns the first element in a paired-value (as known as tuple);
second returns the second element respectively. Function key is used to access the element
inside a leaf. Below are Haskell programs corresponding to these two functions.
head' ts = x where (Leaf x, _) = extractTree ts
tail' = snd ◦ extractTree

Note that as head and tail functions have already been defined in Haskell standard
library, we given them apostrophes to make them distinct. (another option is to hide the
standard ones by importing. We skip the details as they are language specific).

Random access the element in binary random access list

As trees in the forest help managing the elements in blocks, giving an arbitrary index,
it’s easy to locate which tree this element is stored, after that performing a search in the
tree yields the result. As all trees are binary (more accurate, complete binary tree), the
search is essentially binary search, which is bound to the logarithm of the tree size. This
brings us a faster random access capability than linear search in linked-list setting.
12.2. BINARY RANDOM ACCESS LIST 273

Given an index i, and a sequence S, which is actually a forest of trees, the algorithm
is executed as the following 1 .

1. Compare i with the size of the first tree T1 in the forest, if i is less than or equal to
the size, the element exists in T1 , perform looking up in T1 ;

2. Otherwise, decrease i by the size of T1 , and repeat the previous step in the rest of
the trees in the forest.

This algorithm can be represented as the below equation.

{
lookupT ree(T1 , i) : i ≤ |T1 |
get(S, i) = (12.8)
get(S ′ , i − |T1 |) : otherwise

Where |T | = size(T ), and S ′ = {T2 , T3 , ...} is the rest of trees without the first one in
the forest. Note that we don’t handle out of bound error case, this is left as an exercise
to the reader.
Function lookupT ree is a binary search algorithm. If the index i is 1, we just return
the root of the tree, otherwise, we halve the tree by unlinking, if i is less than or equal to
the size of the halved tree, we recursively look up the left tree, otherwise, we look up the
right tree.

 root(T ) : i = 1
lookupT ree(T, i) = lookupT ree(lef t(T )) : i ≤ b |T2 | c (12.9)

lookupT ree(right(T )) : otherwise

Where function lef t returns the left tree Tl of T , while right returns Tr .
The corresponding Haskell program is given as below.
getAt (t:ts) i = if i < size t then lookupTree t i
else getAt ts (i - size t)

lookupTree (Leaf x) 0 = x
lookupTree (Node sz t1 t2) i = if i < sz `div` 2 then lookupTree t1 i
else lookupTree t2 (i - sz `div` 2)

Figure 12.5 illustrates the steps of looking up the 4-th element in a sequence of size
6. It first examine the first tree, since the size is 2 which is smaller than 4, so it goes on
looking up for the second tree with the updated index i′ = 4 − 2, which is the 2nd element
in the rest of the forest. As the size of the next tree is 4, which is greater than 2, so the
element to be searched should be located in this tree. It then examines the left sub tree
since the new index 2 is not greater than the half size 4/2=2; The process next visits the
right grand-child, and the final result is returned.
By using the similar idea, we can update element at any arbitrary position i. We first
compare the size of the first tree T1 in the forest with i, if it is less than i, it means the
element to be updated doesn’t exist in the first tree. We recursively examine the next
tree in the forest, comparing it with i − |T1 |, where |T1 | represents the size of the first
tree. Otherwise if this size is greater than or equal to i, the element is in the tree, we
halve the tree recursively until to get a leaf, at this stage, we can replace the element of
this leaf with a new one.
{
{updateT ree(T1 , i, x)} ∪ S ′ : i < |T1 |
set(S, i, x) = (12.10)
{T1 } ∪ set(S ′ , i − |T1 |, x) : otherwise
1 We follow the tradition that the index i starts from 1 in algorithm description; while it starts from 0

in most programming languages

274 CHAPTER 12. SEQUENCES, THE LAST BRICK

x6 x5 x4 x3 x2 x1

(a) getAt(S, 4)), 4 > size(t1 ) = 2

x4 x3 x2 x1

(b) getAt(S ′ , 4 − 2) ⇒ lookupT ree(t2 , 2)

left(t2)

x4 x3

(c) 2 ≤ ⌊size(t2 )/2⌋ ⇒ lookupT ree(lef t(t2 ), 2)

(d) lookupT ree(right(lef t(t2 )), 1), x3 is re-

turned.

Figure 12.5: Steps of locating the 4-th element in a sequence.

12.3. NUMERIC REPRESENTATION FOR BINARY RANDOM ACCESS LIST 275

Where S ′ = {T2 , T3 , ...} is the rest of the trees in the forest without the first one.
Function setT ree(T, i, x) performs a tree search and replace the i-th element with the
given value x.

 leaf (x) : i = 0 ∧ |T | = 1
setT ree(T, i, x) = tree(|T |, setT ree(Tl , i, x), Tr ) : i < b |T2 | c

tree(|T |, Tl , setT ree(Tr , i − b |T2 | c, x)) : otherwise
(12.11)
Where Tl and Tr are left and right sub tree of T respectively. The following Haskell
program translates the equation accordingly.
setAt :: BRAList a → Int → a → BRAList a
setAt (t:ts) i x = if i < size t then (updateTree t i x):ts
else t:setAt ts (i-size t) x

updateTree :: Tree a → Int → a → Tree a

updateTree (Leaf _) 0 x = Leaf x
updateTree (Node sz t1 t2) i x =
if i < sz `div` 2 then Node sz (updateTree t1 i x) t2
else Node sz t1 (updateTree t2 (i - sz `div` 2) x)

As the nature of complete binary tree, for a sequence with n elements, which is repre-
sented by binary random access list, the number of trees in the forest is bound to O(lg n).
Thus it takes O(lg n) time to locate the tree for arbitrary index i, that contains the el-
ement in the worst case. The followed tree search is bound to the heights of the tree,
which is O(lg n) in the worst case as well. So the total performance of random access is
O(lg n).

Exercise 12.1

1. The random access algorithm given in this section doesn’t handle the error such as
out of bound index at all. Modify the algorithm to handle this case, and implement
it in your favorite programming language.

2. It’s quite possible to realize the binary random access list in imperative settings,
which is benefited with fast operation on the head of the sequence. the random
access can be realized in two steps: firstly locate the tree, secondly use the capability
of constant random access of array. Write a program to implement it in your favorite
imperative programming language.

12.3 Numeric representation for binary random access

list
In previous section, we mentioned that for any sequence with n elements, we can represent
n in binary format so that n = 20 e0 + 21 e1 + ... + 2m em . Where ei is the i-th bit, which
can be either 0 or 1. If ei 6= 0 it means that there is a complete binary tree with size 2i .
This fact indicates us that there is an explicit relationship between the binary form
of n and the forest. Insertion a new element on the head can be simulated by increasing
the binary number by one; while remove an element from the head mimics the decreasing
of the corresponding binary number by one. This is as known as numeric representation
[3].
276 CHAPTER 12. SEQUENCES, THE LAST BRICK

In order to represent the binary random access list with binary number, we can define
two states for a bit. That Zero means there is no such a tree with size which is corre-
sponding to the bit, while One, means such tree exists in the forest. And we can attach
the tree with the state if it is One.
The following Haskell program for instance defines such states.
data Digit a = Zero
| One (Tree a)

type RAList a = [Digit a]

Here we reuse the definition of complete binary tree and attach it to the state One.
Note that we cache the size information in the tree as well.
With digit defined, forest can be treated as a list of digits. Let’s see how inserting
a new element can be realized as binary number increasing. Suppose function one(t)
creates a One state and attaches tree t to it. And function getT ree(s) get the tree
which is attached to the One state s. The sequence S is a list of digits of states that
S = {s1 , s2 , ...}, and S ′ is the rest of digits with the first one removed.

 {one(t)} : S = ϕ
insertT ree(S, t) = {one(t)} ∪ S ′ : s1 = Zero

{Zero} ∪ insertT ree(S ′ , link(t, getT ree(s1 ))) : otherwise
(12.12)
When we insert a new tree t to a forest S of binary digits, If the forest is empty, we
just create a One state, attach the tree to it, and make this state the only digit of the
binary number. This is just like 0 + 1 = 1;
Otherwise if the forest isn’t empty, we need examine the first digit of the binary
number. If the first digit is Zero, we just create a One state, attach the tree, and replace
the Zero state with the new created One state. This is just like (...digits...0)2 + 1 =
(...digits...1)2 . For example 6 + 1 = (110)2 + 1 = (111)2 = 7.
The last case is that the first digit is One, here we make assumption that the tree t
to be inserted has the same size with the tree attached to this One state at this stage.
This can be ensured by calling this function from inserting a leaf, so that the size of the
tree to be inserted grows in a series of 1, 2, 4, ..., 2i , .... In such case, we need link these
two trees (one is t, the other is the tree attached to the One state), and recursively insert
the linked result to the rest of the digits. Note that the previous One state has to be
replaced with a Zero state. This is just like (...digits...1)2 + 1 = (...digits′ ...0)2 , where
(...digits′ ...)2 = (...digits...)2 + 1. For example 7 + 1 = (111)2 + 1 = (1000)2 = 8
Translating this algorithm to Haskell yields the following program.
insertTree :: RAList a → Tree a → RAList a
insertTree [] t = [One t]
insertTree (Zero:ts) t = One t : ts
insertTree (One t' :ts) t = Zero : insertTree ts (link t t')

All the other functions, including link(), cons() etc. are as same as before.
Next let’s see how removing an element from a sequence can be represented as binary
number deduction. If the sequence is a singleton One state attached with a leaf. After
removal, it becomes empty. This is just like 1 − 1 = 0;
Otherwise, we examine the first digit, if it is One state, it will be replaced with a Zero
state to indicate that this tree will be no longer exist in the forest as it being removed. This
is just like (...digits...1)2 −1 = (...digits...0)2 . For example 7−1 = (111)2 −1 = (110)2 = 6;
If the first digit in the sequence is a Zero state, we have to borrow from the further
digits for removal. We recursively extract a tree from the rest digits, and halve the
extracted tree to its two children. Then the Zero state will be replaced with a One state
12.3. NUMERIC REPRESENTATION FOR BINARY RANDOM ACCESS LIST 277

attached with the right children, and the left children is removed. This is something like
(...digits...0)2 − 1 = (...digits′ ...1)2 , where (...digits′ ...)2 = (...digits)2 − 1. For example
4 − 1 = (100)2 − 1 = (11)2 = [Link] following equation illustrated this algorithm.

 (t, ϕ) : S = {one(t)}
extractT ree(S) = (t, S ′ ) : s1 = one(t) (12.13)

(tl , {one(tr )} ∪ S ′′ : otherwise

Where (t′ , S ′′ ) = extractT ree(S ′ ), tl and tr are left and right sub-trees of t′ . All other
functions, including head, tail are as same as before.
Numeric representation doesn’t change the performance of binary random access list,
readers can refer to [64] for detailed discussion. Let’s take for example, analyze the average
performance (or amortized) of insertion on head algorithm by using aggregation analysis.
Considering the process of inserting n = 2m elements to an empty binary random
access list. The numeric representation of the forest can be listed as the following.
i forest (MSB ... LSB)
0 0, 0, ..., 0, 0
1 0, 0, ..., 0, 1
2 0, 0, ..., 1, 0
3 0, 0, ..., 1, 1
... ...
2m − 1 1, 1, ..., 1, 1
2m 1, 0, 0, ..., 0, 0
bits changed 1, 1, 2, ... 2m−1 . 2m
The LSB of the forest changed every time when there is a new element inserted, it
costs 2m units of computation; The next bit changes every two times due to a linking
operation, so it costs 2m−1 units; the bit next to MSB of the forest changed only one time
which links all previous trees to a big tree as the only one in the forest. This happens
at the half time of the total insertion process, and after the last element is inserted, the
MSB flips to 1.
Sum these costs up yield to the total cost T = 1 + 1 + 2 + 4 + ... + 2m−1 + 2m = 2m+1
So the average cost for one insertion is

2m+1
O(T /N ) = O( ) = O(1) (12.14)
2m
Which proves that the insertion algorithm performs in amortized O(1) constant time.
The proof for deletion are left as an exercise to the reader.

12.3.1 Imperative binary random access list

It’s trivial to implement the imperative binary random access list by using binary trees,
and the recursion can be eliminated by updating the focused tree in loops. This is
left as an exercise to the reader. In this section, we’ll show some different imperative
implementation by using the properties of numeric representation.
Remind the chapter about binary heap. Binary heap can be represented by implicit
array. We can use similar approach that use an array of 1 element to represent the leaf;
use an array of 2 elements to represent a binary tree of height 1; and use an array of 2m
to represent a complete binary tree of height m.
This brings us the capability of accessing any element with index directly instead of
divide and conquer tree search. However, the tree linking operation has to be implemented
as array copying as the expense.
The following ANSI C code defines such a forest.
278 CHAPTER 12. SEQUENCES, THE LAST BRICK

#define M sizeof(int) ∗ 8
typedef int Key;

struct List {
int n;
Key∗ tree[M];
};

Where n is the number of the elements stored in this forest. Of course we can avoid
limiting the max number of trees by using dynamic arrays, for example as the following
ISO C++ code.
template<typename Key>
struct List {
int n;
vector<vector<key> > tree;
};

For illustration purpose only, we use ANSI C here2 .

Let’s review the insertion process, if the first tree is empty (a Zero digit), we simply set
the first tree as a leaf of the new element to be inserted; otherwise, the insertion will cause
tree linking anyway, and such linking may be recursive until it reach a position (digit)
that the corresponding tree is empty. The numeric representation reveals an important
fact that if the first, second, ..., (i − 1)-th trees all exist, and the i-th tree is empty, the
result is creating a tree of size 2i , and all the elements together with the new element to
be inserted are stored in this new created tree. What’s more, all trees after position i are
kept as same as before.
Is there any good methods to locate this i position? As we can use binary number to
represent the forest of n element, after a new element is inserted, n increases to n + 1.
Compare the binary form of n and n + 1, we find that all bits before i change from 1 to
0, the i-th bit flip from 0 to 1, and all the bits after i keep unchanged. So we can use
bit-wise exclusive or (⊕) to detect this bit. Here is the algorithm.
function Number-Of-Bits(n)
i←0
while b n2 c 6= 0 do
n ← b n2 c
i←i+1
return i

i ← Number-Of-Bits(n ⊕ (n + 1))
The Number-Of-Bits process can be easily implemented with bit shifting, for ex-
ample the below ANSI C code.
int nbits(int n) {
int i=0;
while(n >≥ 1)
++i;
return i;
}

So the imperative insertion algorithm can be realized by first locating the bit which
flip from 0 to 1, then creating a new array of size 2i to represent a complete binary tree,
and moving content of all trees before this bit to this array as well as the new element to
be inserted.
function Insert(L, x)
2 The complete ISO C++ example program is available with this book.
12.3. NUMERIC REPRESENTATION FOR BINARY RANDOM ACCESS LIST 279

i ← Number-Of-Bits(n ⊕ (n + 1))
Tree(L)[i + 1] ← Create-Array(2i )
l←1
Tree(L)[i + 1][l] ← x
for j ∈ [1, i] do
for k ∈ [1, 2j ] do
l ←l+1
Tree(L)[i + 1][l] ← Tree(L)[j][k]
Tree(L)[j] ← NIL
Size(L) ← Size(L) + 1
return L
The corresponding ANSI C program is given as the following.

struct List insert(struct List a, Key x) {

int i, j, sz;
Key∗ xs;
i = nbits((a.n + 1) ^ a.n );
xs = [Link][i] = (Key∗)malloc(sizeof(Key)∗(1 << i));
for(j = 0, ∗xs++ = x, sz = 1; j < i; ++j, sz << = 1) {
memcpy((void∗)xs, (void∗)[Link][j], sizeof(Key) ∗ (sz));
xs += sz;
free([Link][j]);
[Link][j] = NULL;
}
++a.n;
return a;
}

However, the performance in theory isn’t as good as before. This is because the linking
operation downgrades from O(1) constant time to linear array copying.
We can again calculate the average (amortized) performance by using aggregation
analysis. When insert n = 2m elements to an empty list which is represented by implicit
binary trees in arrays, the numeric presentation of the forest of arrays are as same as
before except for the cost of bit flipping.
i forest (MSB ... LSB)
0 0, 0, ..., 0, 0
1 0, 0, ..., 0, 1
2 0, 0, ..., 1, 0
3 0, 0, ..., 1, 1
... ...
2m − 1 1, 1, ..., 1, 1
2m 1, 0, 0, ..., 0, 0
bit change cost 1×2 , 1×2
m m−1
, 2×2m−2
, ... 2 m−2
× 2, 2m−1 × 1
The LSB of the forest changed every time when there is a new element inserted,
however, it creates leaf tree and performs copying only it changes from 0 to 1, so the cost
is half of n unit, which is 2m−1 ; The next bit flips as half as the LSB. Each time the bit
gets flipped to 1, it copies the first tree as well as the new element to the second tree. the
the cost of flipping a bit to 1 in this bit is 2 units, but not 1; For the MSB, it only flips
to 1 at the last time, but the cost of flipping this bit, is copying all the previous trees to
fill the array of size 2m .
Summing all to cost and distributing them to the n times of insertion yields the
280 CHAPTER 12. SEQUENCES, THE LAST BRICK

amortized performance as below.

1 × 2m + 1 × 2m−1 + 2 × 2m−2 + ... + 2m−1 × 1

O(T /N ) = O( )
m 2m
= O(1 + ) (12.15)
2
= O(m)

As m = O(lg n), so the amortized performance downgrade from constant time to

logarithm, although it is still faster than the normal array insertion which is O(n) in
average.
The random accessing gets a bit faster because we can use array indexing instead of
tree search.
function Get(L, i)
for each t ∈ Trees(L) do
if t 6= NIL then
if i ≤ Size(t) then
return t[i]
else
i ← i− Size(t)
Here we skip the error handling such as out of bound indexing etc. The ANSI C
program of this algorithm is like the following.
Key get(struct List a, int i) {
int j, sz;
for(j = 0, sz = 1; j < M; ++j, sz << = 1)
if([Link][j]) {
if(i < sz)
break;
i -= sz;
}
return [Link][j][i];
}

The imperative removal and random mutating algorithms are left as exercises to the
reader.

Exercise 12.2

1. Please implement the random access algorithms, including looking up and updating,
for binary random access list with numeric representation in your favorite program-
ming language.
2. Prove that the amortized performance of deletion is O(1) constant time by using
aggregation analysis.
3. Design and implement the binary random access list by implicit array in your fa-
vorite imperative programming language.

12.4 Imperative paired-array list

12.4.1 Definition
In previous chapter about queue, a symmetric solution, naming paired-array is presented.
It is capable to operate on both ends of the queue. Because the nature that array supports
12.4. IMPERATIVE PAIRED-ARRAY LIST 281

x[n] ... x[2] x[1] y[1] y[2] ... y[m]

Figure 12.6: A paired-array list, which is consist of 2 arrays linking in head-head manner.

fast random access. It can be also used to realize a fast random access sequence in
imperative setting.
Figure 12.6 shows the design of paired-array list. Two arrays are linked in head-head
manner. To insert a new element on the head of the sequence, the element is appended at
the end of front array; To append a new element on the tail of the sequence, the element
is appended at the end of rear array;
Here is a ISO C++ code snippet to define the this data structure.
template<typename Key>
struct List {
int n, m;
vector<Key> front;
vector<Key> rear;

List() : n(0), m(0) {}

int size() { return n + m; }
};

Here we use vector provides in standard library to cover the dynamic memory man-
agement issues, so that we can concentrate on the algorithm design.

12.4.2 Insertion and appending

Suppose function Front(L) returns the front array, while Rear(L) returns the rear
array. For illustration purpose, we assume the arrays are dynamic allocated. inserting
and appending can be realized as the following.
function Insert(L, x)
F ← Front(L)
Size(F ) ← Size(F ) + 1
F [Size(F )] ← x

function Append(L, x)
R ← Rear(L)
Size(R) ← Size(R) + 1
R[Size(R)] ← x
As all the above operations manipulate the front and rear array on tail, they are all
constant O(1) time. And the following are the corresponding ISO C++ programs.
template<typename Key>
void insert(List<Key>& xs, Key x) {
++xs.n;
[Link].push_back(x);
}

template<typename Key>
void append(List<Key>& xs, Key x) {
++xs.m;
[Link].push_back(x);
}
282 CHAPTER 12. SEQUENCES, THE LAST BRICK

12.4.3 random access

As the inner data structure is array (dynamic array as vector), which supports random
access by nature, it’s trivial to implement constant time indexing algorithm.
function Get(L, i)
F ← Front(L)
n ← Size(F )
if i ≤ n then
return F [n − i + 1]
else
return Rear(L)[i − n]
Here the index i ∈ [1, |L|]. If it is not greater than the size of front array, the element
is stored in front. However, as front and rear arrays are connect head-to-head, so the
elements in front array are in reverse order. We need locate the element by subtracting
the size of front array by i; If the index i is greater than the size of front array, the element
is stored in rear array. Since elements are stored in normal order in rear, we just need
subtract the index i by an offset which is the size of front array.
Here is the ISO C++ program implements this algorithm.
template<typename Key>
Key get(List<Key>& xs, int i) {
if( i < xs.n )
return [Link][xs.n-i-1];
else
return [Link][i-xs.n];
}

The random mutating algorithm is left as an exercise to the reader.

12.4.4 removing and balancing

Removing isn’t as simple as insertion and appending. This is because we must handle the
condition that one array (either front or rear) becomes empty due to removal, while the
other still contains elements. In extreme case, the list turns to be quite unbalanced. So
we must fix it to resume the balance.
One idea is to trigger this fixing when either front or rear array becomes empty. We
just cut the other array in half, and reverse the first half to form the new pair. The
algorithm is described as the following.
function Balance(L)
F ← Front(L), R ← Rear(L)
n ← Size(F ), m ← Size(R)
if F = ϕ then
F ← Reverse(R[1 ... b m 2 c])
R ← R[b m 2 c + 1 ... m]
else if R = ϕ then
R ← Reverse(F [1 ... b n2 c])
F ← F [b n2 c + 1 ... n]
Actually, the operations are symmetric for the case that front is empty and the case
that rear is empty. Another approach is to swap the front and rear for one symmetric
case and recursive resumes the balance, then swap the front and rear back. For example
below ISO C++ program uses this method.
template<typename Key>
void balance(List<Key>& xs) {
if(xs.n == 0) {
12.4. IMPERATIVE PAIRED-ARRAY LIST 283

back_insert_iterator<vector<Key> > i([Link]);

reverse_copy([Link](), [Link]() + xs.m/2, i);
[Link]([Link](), [Link]() +xs.m/2);
xs.n = xs.m/2;
xs.m -= xs.n;
} else if(xs.m == 0) {
swap([Link], [Link]);
swap(xs.n, xs.m);
balance(xs);
swap([Link], [Link]);
swap(xs.n, xs.m);
}
}

With Balance algorithm defined, it’s trivial to implement remove algorithm both on
head and on tail.
function Remove-Head(L)
Balance(L)
F ← Front(L)
if F = ϕ then
Remove-Tail(L)
else
Size(F ) ← Size(F ) - 1

function Remove-Tail(L)
Balance(L)
R ← Rear(L)
if R = ϕ then
Remove-Head(L)
else
Size(R) ← Size(R) - 1
There is an edge case for each, that is even after balancing, the array targeted to
perform removal is still empty. This happens that there is only one element stored in the
paired-array list. The solution is just remove this singleton left element, and the overall
list results empty. Below is the ISO C++ program implements this algorithm.
template<typename Key>
void remove_head(List<Key>& xs) {
balance(xs);
if([Link]())
remove_tail(xs); //remove the singleton elem in rear
else {
[Link].pop_back();
--xs.n;
}
}

template<typename Key>
void remove_tail(List<Key>& xs) {
balance(xs);
if([Link]())
remove_head(xs); //remove the singleton elem in front
else {
[Link].pop_back();
--xs.m;
}
}

It’s obvious that the worst case performance is O(n) where n is the number of elements
284 CHAPTER 12. SEQUENCES, THE LAST BRICK

stored in paired-array list. This happens when balancing is triggered, and both reverse
and shifting are linear operation. However, the amortized performance of removal is still
O(1), the proof is left as exercise to the reader.

Exercise 12.3

1. Implement the random mutating algorithm in your favorite imperative programming

language.

2. We utilized vector provided in standard library to manage memory dynamically, try

to realize a version using plain array and manage the memory allocation manually.
Compare this version and consider how does this affect the performance?

3. Prove that the amortized performance of removal is O(1) for paired-array list.

12.5 Concatenate-able list

By using binary random access list, we realized sequence data structure which supports
O(lg n) time insertion and removal on head, as well as random accessing element with a
given index.
However, it’s not so easy to concatenate two lists. As both lists are forests of complete
binary trees, we can’t merely merge them (Since forests are essentially list of trees, and
for any size, there is at most one tree of that size. Even concatenate forests directly is
not fast). One solution is to push the element from the first sequence one by one to a
stack and then pop those elements and insert them to the head of the second one by
using ‘cons’ function. Of course the stack can be implicitly used in recursion manner, for
instance:
{
s2 : s1 = ϕ
concat(s1 , s2 ) = (12.16)
cons(head(s1 ), concat(tail(s1 ), s2 )) : otherwise

Where function cons, head and tail are defined in previous section.
If the length of the two sequence is n, and m, this method takes O(N lg n) time
repeatedly push all elements from the first sequence to stacks, and then takes Ω(n lg(n +
m)) to insert the elements in front of the second sequence. Note that Ω means the upper
limit, There is detailed definition for it in [4].
We have already implemented the real-time queue in previous chapter. It supports
O(1) time pop and push. If we can turn the sequence concatenation to a kind of pushing
operation to queue, the performance will be improved to O(1) as well. Okasaki gave such
realization in [3], which can concatenate lists in constant time.
To represent a concatenate-able list, the data structure designed by Okasaki is essen-
tially a K-ary tree. The root of the tree stores the first element in the list. So that we can
access it in constant O(1) time. The sub-trees or children are all small concatenate-able
lists, which are managed by real-time queues. Concatenating another list to the end is
just adding it as the last child, which is in turn a queue pushing operation. Appending a
new element can be realized as that, first wrapping the element to a singleton tree, which
is a leaf with no children. Then, concatenate this singleton to finalize appending.
Figure 12.7 illustrates this data structure.
Such recursively designed data structure can be defined in the following Haskell code.
data CList a = Empty | CList a (Queue (CList a))
12.5. CONCATENATE-ABLE LIST 285

x[1]

c[1] c[2] ... c[n]

x[2]...x[i] x[i+1]...x[j] x[k]...x[n]

(a) The data structure for list {x1 , x2 , ..., xn }

x[1]

c[1] c[2] ... c[n] c[n+1]

x[2]...x[i] x[i+1]...x[j] x[k]...x[n] y[1]...y[m]

(b) The result after concatenated with list {y1 , y2 , ..., ym }

Figure 12.7: Data structure for concatenate-able list

It means that a concatenate-able list is either empty or a K-ary tree, which again
consists of a queue of concatenate-able sub-lists and a root element. Here we reuse the
realization of real-time queue mentioned in previous chapter.
Suppose function clist(x, Q) constructs a concatenate-able list from an element x, and
a queue of sub-lists Q. While function root(s) returns the root element of such K-ary tree
implemented list. and function queue(s) returns the queue of sub-lists respectively. We
can implement the algorithm to concatenate two lists like this.

 s1 : s2 = ϕ
concat(s1 , s2 ) = s2 : s1 = ϕ (12.17)

clist(x, push(Q, s2 )) : otherwise

Where x = root(s1 ) and Q = queue(s1 ). The idea of concatenation is that if either

one of the list to be concatenated is empty, the result is just the other list; otherwise, we
push the second list as the last child to the queue of the first list.
Since the push operation is O(1) constant time for a well realized real-time queue, the
performance of concatenation is bound to O(1).
The concat function can be translated to the below Haskell program.
concat x Empty = x
concat Empty y = y
concat (CList x q) y = CList x (push q y)

Besides the good performance of concatenation, this design also brings satisfied fea-
tures for adding element both on head and tail.

cons(x, s) = concat(clist(x, ϕ), s) (12.18)

append(s, x) = concat(s, clist(x, ϕ)) (12.19)

Getting the first element is just returning the root of the K-ary tree.

head(s) = root(s) (12.20)

It’s a bit complex to realize the algorithm that removes the first element from a
concatenate-able list. This is because after the root, which is the first element in the
286 CHAPTER 12. SEQUENCES, THE LAST BRICK

sequence got removed, we have to re-construct the rest things, a queue of sub-lists, to a
K-ary tree.
After the root being removed, there left all children of the K-ary tree. Note that all of
them are also concatenate-able list, so that one natural solution is to concatenate them
all together to a big list.
{
ϕ : Q=ϕ
concatAll(Q) = (12.21)
concat(f ront(Q), concatAll(pop(Q))) : otherwise

Where function f ront just returns the first element from a queue without removing
it, while pop does the removing work.
If the queue is empty, it means that there is no children at all, so the result is also an
empty list; Otherwise, we pop the first child, which is a concatenate-able list, from the
queue, and recursively concatenate all the rest children to a list; finally, we concatenate
this list behind the already popped first children.
With concatAll defined, we can then implement the algorithm of removing the first
element from a list as below.

tail(s) = linkAll(queue(s)) (12.22)

The corresponding Haskell program is given like the following.

head (CList x _) = x
tail (CList _ q) = linkAll q

linkAll q | isEmptyQ q = Empty

| otherwise = link (front q) (linkAll (pop q))

Function isEmptyQ is used to test a queue is empty, it is trivial and we omit its
definition. Readers can refer to the source code along with this book.
linkAll algorithm actually traverses the queue data structure, and reduces to a final
result. This remind us of folding mentioned in the chapter of binary search tree. readers
can refer to the appendix of this book for the detailed description of folding. It’s quite
possible to define a folding algorithm for queue instead of list3 [10].
{
e : Q=ϕ
f oldQ(f, e, Q) = (12.23)
f (f ront(Q), f oldQ(f, e, pop(Q))) : otherwise

Function f oldQ takes three parameters, a function f , which is used for reducing, an
initial value e, and the queue Q to be traversed.
Here are some examples to illustrate folding on queue. Suppose a queue Q contains
elements {1, 2, 3, 4, 5} from head to tail.

f oldQ(+, 0, Q) = 1 + (2 + (3 + (4 + (5 + 0)))) = 15
f oldQ(×, 1, Q) = 1 × (2 × (3 × (4 × (5 × 1)))) = 120
f oldQ(×, 0, Q) = 1 × (2 × (3 × (4 × (5 × 0)))) = 0

Function linkAll can be changed by using f oldQ accordingly.

linkAll(Q) = f oldQ(link, ϕ, Q) (12.24)

The Haskell program can be modified as well.

3 Some functional programming language, such as Haskell, defined type class, which is a concept of

monoid so that it’s easy to support folding on a customized data structure.

12.6. FINGER TREE 287

linkAll = foldQ link Empty

foldQ :: (a → b → b) → b → Queue a → b
foldQ f z q | isEmptyQ q = z
| otherwise = (front q) `f` foldQ f z (pop q)

However, the performance of removing can’t be ensured in all cases. The worst case
is that, user keeps appending n elements to a empty list, and then immediately performs
removing. At this time, the K-ary tree has the first element stored in root. There are
n − 1 children, all are leaves. So linkAll() algorithm downgrades to O(n) which is linear
time.
Considering the add, append, concatenate and removing operations are randomly
performed. The average case is amortized O(1), The proof is left as en exercise to the
reader.

Exercise 12.4

1. Can you figure out a solution to append an element to the end of a binary random
access list?

2. Prove that the amortized performance of removal operation for concatenate-able

list is O(1). Hint: using the banker’s method.

3. Implement the concatenate-able list in your favorite imperative language.

12.6 Finger tree

We haven’t been able to meet all the performance targets listed at the beginning of this
chapter.
Binary random access list enables to insert, remove element on the head of sequence,
and random access elements fast. However, it performs poor when concatenates lists.
There is no good way to append element at the end of binary random access list.
Concatenate-able list is capable to concatenates multiple lists in a fly, and it performs
well for adding new element both on head and tail. However, it doesn’t support randomly
access element with a given index.
These two examples bring us some ideas:

• In order to support fast manipulation both on head and tail of the sequence, there
must be some way to easily access the head and tail position;

• Tree like data structure helps to turn the random access into divide and conquer
search, if the tree is well balance, the search can be ensured to be logarithm time.

12.6.1 Definition
Finger tree[66], which was first invented in 1977, can help to realize eﬀicient sequence.
And it is also well implemented in purely functional settings[65].
As we mentioned that the balance of the tree is critical to ensure the performance for
search. One option is to use balanced tree as the under ground data structure for finger
tree. For example the 2-3 tree, which is a special B-tree. (readers can refer to the chapter
of B-tree of this book).
A 2-3 tree either contains 2 children or 3. It can be defined as below in Haskell.
288 CHAPTER 12. SEQUENCES, THE LAST BRICK

data Node a = Br2 a a | Br3 a a a

In imperative settings, node can be defined with a list of sub nodes, which contains
at most 3 children. For instance the following ANSI C code defines node.
union Node {
Key∗ keys;
union Node∗ children;
};

Note in this definition, a node can either contain 2 ∼ 3 keys, or 2 ∼ 3 sub nodes.
Where key is the type of elements stored in leaf node.
We mark the left-most none-leaf node as the front finger (or left finger) and the right-
most none-leaf node as the rear finger (or right finger). Since both fingers are essentially
2-3 trees with all leafs as children, they can be directly represented as list of 2 or 3 leafs.
Of course a finger tree can be empty or contain only one element as leaf.
So the definition of a finger tree is specified like this.

• A finger tree is either empty;

• or a singleton leaf;

• or contains three parts: a left finger which is a list contains at most 3 elements; a
sub finger tree; and a right finger which is also a list contains at most 3 elements.

Note that this definition is recursive, so it’s quite possible to be translated to functional
settings. The following Haskell definition summaries these cases for example.
data Tree a = Empty
| Lf a
| Tr [a] (Tree (Node a)) [a]

In imperative settings, we can define the finger tree in a similar manner. What’s more,
we can add a parent field, so that it’s possible to back-track to root from any tree node.
Below ANSI C code defines finger tree accordingly.
struct Tree {
union Node∗ front;
union Node∗ rear;
Tree∗ mid;
Tree∗ parent;
};

We can use NIL pointer to represent an empty tree; and a leaf tree contains only one
element in its front finger, both its rear finger and middle part are empty.
Figure 12.8 and 12.9 show some examples of figure tree.
The first example is an empty finger tree; the second one shows the result after in-
serting one element to empty, it becomes a leaf of one node; the third example shows a
finger tree contains 2 elements, one is in front finger, the other is in rear.
If we continuously insert new elements, to the tree, those elements will be put in the
front finger one by one, until it exceeds the limit of 2-3 tree. The 4-th example shows
such condition, that there are 4 elements in front finger, which isn’t balanced any more.
The last example shows that the finger tree gets fixed so that it resumes balancing.
There are two elements in the front finger. Note that the middle part is not empty any
longer. It’s a leaf of a 2-3 tree (why it’s a leaf is explained later). The content of the leaf
is a tree with 3 branches, each contains an element.
We can express these 5 examples as the following Haskell expression.
12.6. FINGER TREE 289

NIL
a

(a) An empty tree (b) A singleton leaf

b NIL a

(c) Front finger and rear

finger contain one ele-
ment for each, the middle
part is empty

Figure 12.8: Examples of finger tree, 1

f e a

e d c b NIL a
d c b

(a) After inserting extra (b) The tree resumes

3 elements to front fin- balancing. There are
ger, it exceeds the 2-3 tree 2 elements in front fin-
constraint, which isn’t bal- ger; The middle part is
anced any more a leaf, which contains a
3-branches 2-3 tree.

Figure 12.9: Examples of finger tree, 2

290 CHAPTER 12. SEQUENCES, THE LAST BRICK

Empty
Lf a
[b] Empty [a]
[e, d, c, b] Empty [a]
[f, e] Lf (Br3 d c b) [a]

In the last example, why the middle part inner tree is a leaf? As we mentioned that
the definition of finger tree is recursive. The middle part besides the front and rear finger
is a deeper finger tree, which is defined as T ree(N ode(a)). Every time we go deeper,
the N ode is embedded one more level. if the element type of the first level tree is a, the
element type for the second level tree is N ode(a), the third level is N ode(N ode(a)), ...,
the n-th level is N ode(N ode(N ode(...(a))...)) = N oden (a), where n indicates the N ode is
applied n times.

12.6.2 Insert element to the head of sequence

The examples list above actually reveal the typical process that the elements are inserted
one by one to a finger tree. It’s possible to summarize these examples to some cases for
insertion on head algorithm.
When we insert an element x to a finger tree T ,
• If the tree is empty, the result is a leaf which contains the singleton element x;
• If the tree is a singleton leaf of element y, the result is a new finger tree. The front
finger contains the new element x, the rear finger contains the previous element y;
the middle part is a empty finger tree;
• If the number of elements stored in front finger isn’t bigger than the upper limit of
2-3 tree, which is 3, the new element is just inserted to the head of front finger;
• otherwise, it means that the number of elements stored in front finger exceeds the
upper limit of 2-3 tree. the last 3 elements in front finger is wrapped in a 2-3 tree
and recursively inserted to the middle part. the new element x is inserted in front
of the rest elements in front finger.
Suppose that function leaf (x) creates a leaf of element x, function tree(F, T ′ , R)
creates a finger tree from three part: F is the front finger, which is a list contains several
elements. Similarity, R is the rear finger, which is also a list. T ′ is the middle part which
is a deeper finger tree. Function tr3(a, b, c) creates a 2-3 tree from 3 elements a, b, c; while
tr2(a, b) creates a 2-3 tree from 2 elements a and b.


 leaf (x) : T = ϕ

tree({x}, ϕ, {y}) : T = leaf (y)
insertT (x, T ) =

 tree({x, x1 }, insertT (tr3(x2 , x3 , x4 ), T ′ ), R) : T = tree({x1 , x2 , x3 , x4 }, T ′ , R)

tree({x} ∪ F, T ′ , R) : otherwise
(12.25)
The performance of this algorithm is dominated by the recursive case. All the other
cases are constant O(1) time. The recursion depth is proportion to the height of the tree,
so the algorithm is bound to O(h) time, where h is the height. As we use 2-3 tree to
ensure that the tree is well balanced, h = O(lg n), where n is the number of elements
stored in the finger tree.
More analysis reveal that the amortized performance of insertT is O(1) because we
can amortize the expensive recursion case to other trivial cases. Please refer to [3] and
[65] for the detailed proof.
Translating the algorithm yields the below Haskell program.
12.6. FINGER TREE 291

cons :: a → Tree a → Tree a

cons a Empty = Lf a
cons a (Lf b) = Tr [a] Empty [b]
cons a (Tr [b, c, d, e] m r) = Tr [a, b] (cons (Br3 c d e) m) r
cons a (Tr f m r) = Tr (a:f) m r

Here we use the LISP naming convention to illustrate inserting a new element to a
list.
The insertion algorithm can also be implemented in imperative approach. Suppose
function Tree() creates an empty tree, that all fields, including front and rear finger, the
middle part inner tree and parent are empty. Function Node() creates an empty node.
function Prepend-Node(n, T )
r ← Tree()
p←r
Connect-Mid(p, T )
while Full?(Front(T )) do
F ← Front(T ) ▷ F = {n1 , n2 , n3 , ...}
Front(T ) ← {n, F [1]} ▷ F [1] = n1
n ← Node()
Children(n) ← F [2..] ▷ F [2..] = {n2 , n3 , ...}
p←T
T ← Mid(T )
if T = NIL then
T ← Tree()
Front(T )← {n}
else if | Front(T ) | = 1 ∧ Rear(T ) = ϕ then
Rear(T ) ← Front(T )
Front(T ) ← {n}
else
Front(T ) ← {n}∪ Front(T )
Connect-Mid(p, T ) ← T
return Flat(r)
Where the notation L[i..] means a sub list of L with the first i − 1 elements removed,
that if L = {a1 , a2 , ..., an }, then L[i..] = {ai , ai+1 , ..., an }.
Functions Front, Rear, Mid, and Parent are used to access the front finger, the
rear finger, the middle part inner tree and the parent tree respectively; Function Chil-
dren accesses the children of a node.
Function Connect-Mid(T1 , T2 ), connect T2 as the inner middle part tree of T1 , and
set the parent of T2 as T1 if T2 isn’t empty.
In this algorithm, we performs a one pass top-down traverse along the middle part
inner tree if the front finger is full that it can’t afford to store any more. The criteria for
full for a 2-3 tree is that the finger contains 3 elements already. In such case, we extract
all the elements except the first one off, wrap them to a new node (one level deeper node),
and continuously insert this new node to its middle inner tree. The first element is left in
the front finger, and the element to be inserted is put in front of it, so that this element
becomes the new first one in the front finger.
After this traversal, the algorithm either reach an empty tree, or the tree still has
room to hold more element in its front finger. We create a new leaf for the former case,
and perform a trivial list insert to the front finger for the latter.
During the traversal, we use p to record the parent of the current tree we are processing.
So any new created tree are connected as the middle part inner tree to p.
292 CHAPTER 12. SEQUENCES, THE LAST BRICK

Finally, we return the root of the tree r. The last trick of this algorithm is the Flat
function. In order to simplify the logic, we create an empty ‘ground’ tree and set it as
the parent of the root. We need eliminate this extra ‘ground’ level before return the root.
This flatten algorithm is realized as the following.
function Flat(T )
while T 6= NIL ∧T is empty do
T ← Mid(T )
if T 6= NIL then
Parent(T ) ← NIL
return T
The while loop test if T is trivial empty, that it’s not NIL(= ϕ), while both its front
and rear fingers are empty.
Below Python code implements the insertion algorithm for finger tree.

def insert(x, t):

return prepend_node(wrap(x), t)

def prepend_node(n, t):

root = prev = Tree()
prev.set_mid(t)
while frontFull(t):
f = [Link]
[Link] = [n] + f[:1]
n = wraps(f[1:])
prev = t
t = [Link]
if t is None:
t = leaf(n)
elif len([Link])==1 and [Link] == []:
t = Tree([n], None, [Link])
else:
t = Tree([n]+[Link], [Link], [Link])
prev.set_mid(t)
return flat(root)

def flat(t):
while t is not None and [Link]():
t = [Link]
if t is not None:
[Link] = None
return t

The implementation of function set_mid, frontFull, wrap, wraps, empty, and

tree constructor are trivial enough, that we skip the detail of them here. Readers can
take these as exercises.

12.6.3 Remove element from the head of sequence

It’s easy to implement the reverse operation that remove the first element from the list
by reversing the insertT () algorithm line by line.
Let’s denote F = {f1 , f2 , ...} is the front finger list, M is the middle part inner finger
tree. R = {r1 , r2 , ...} is the rear finger list of a finger tree, and R′ = {r2 , r3 , ...} is the rest
12.6. FINGER TREE 293

of element with the first one removed from R.



 (x, ϕ) :
T = leaf (x)


 T = tree({x}, ϕ, {y})
(x, leaf (y)) :
extractT (T ) = (x, tree({r1 }, ϕ, R′ )) :
T = tree({x}, ϕ, R)


 (x, tree(toList(F ′ ), M ′ , R))
 T = tree({x}, M, R), (F ′ , M ′ ) = extractT (M )
:

(f1 , tree({f2 , f3 , ...}, M, R)) :
otherwise
(12.26)
Where function toList(T ) converts a 2-3 tree to plain list as the following.
{
{x, y} : T = tr2(x, y)
toList(T ) = (12.27)
{x, y, z} : T = tr3(x, y, z)

Here we skip the error handling such as trying to remove element from empty tree
etc. If the finger tree is a leaf, the result after removal is an empty tree; If the finger tree
contains two elements, one in the front finger, the other in rear, we return the element
stored in front finger as the first element, and the resulted tree after removal is a leaf; If
there is only one element in front finger, the middle part inner tree is empty, and the rear
finger isn’t empty, we return the only element in front finger, and borrow one element
from the rear finger to front; If there is only one element in front finger, however, the
middle part inner tree isn’t empty, we can recursively remove a node from the inner tree,
and flatten it to a plain list to replace the front finger, and remove the original only
element in front finger; The last case says that if the front finger contains more than one
element, we can just remove the first element from front finger and keep all the other part
unchanged.
Figure 12.10 shows the steps of removing two elements from the head of a sequence.
There are 10 elements stored in the finger tree. When the first element is removed, there
is still one element left in the front finger. However, when the next element is removed,
the front finger is empty. So we ‘borrow’ one tree node from the middle part inner tree.
This is a 2-3 tree. it is converted to a list of 3 elements, and the list is used as the
new finger. the middle part inner tree change from three parts to a singleton leaf, which
contains only one 2-3 tree node. There are three elements stored in this tree node.
Below is the corresponding Haskell program for ‘uncons’.
uncons :: Tree a → (a, Tree a)
uncons (Lf a) = (a, Empty)
uncons (Tr [a] Empty [b]) = (a, Lf b)
uncons (Tr [a] Empty (r:rs)) = (a, Tr [r] Empty rs)
uncons (Tr [a] m r) = (a, Tr (nodeToList f) m' r) where (f, m') = uncons m
uncons (Tr f m r) = (head f, Tr (tail f) m r)

And the function nodeToList is defined like this.

nodeToList :: Node a → [a]
nodeToList (Br2 a b) = [a, b]
nodeToList (Br3 a b c) = [a, b, c]

Similar as above, we can define head and tail function from uncons.
head = fst ◦ uncons
tail = snd ◦ uncons

12.6.4 Handling the ill-formed finger tree when removing

The strategy used so far to remove element from finger tree is a kind of removing and
borrowing. If the front finger becomes empty after removing, we borrows more nodes
294 CHAPTER 12. SEQUENCES, THE LAST BRICK

x[10] x[9] x[2] x[1]

NIL

x[8] x[7] x[6] x[5] x[4] x[3]

(a) A sequence of 10 elements repre-

sented as a finger tree

x[9] x[2] x[1]

NIL

x[8] x[7] x[6] x[5] x[4] x[3]

(b) The first element is removed. There

is one element left in front finger.

x[8] x[7] x[6] x[2] x[1]

x[5] x[4] x[3]

(c) Another element is remove from head. We borrowed one node

from the middle part inner tree, change the node, which is a 2-3
tree to a list, and use it as the new front finger. the middle part
inner tree becomes a leaf of one 2-3 tree node.

Figure 12.10: Examples of remove 2 elements from the head of a sequence

12.6. FINGER TREE 295

from the middle part inner tree. However there exists cases that the tree is ill-formed,
for example, both the front fingers of the tree and its middle part inner tree are empty.
Such ill-formed tree can result from imperatively splitting, which we’ll introduce later.

[] 2 r[1][1] r[1][2] ...

[] 3 r[2][1] r[2][2] ...

...

n[i][1] n[i][2] ... r[i][1] r[i][2] ...

...

Figure 12.11: Example of an ill-formed tree. The front finger of the i-th level sub tree
isn’t empty.

Here we developed an imperative algorithm which can remove the first element from
finger tree even it is ill-formed. The idea is first perform a top-down traverse to find a sub
tree which either has a non-empty front finger or both its front finger and middle part
inner tree are empty. For the former case, we can safely extract the first element which
is a node from the front finger; For the latter case, since only the rear finger isn’t empty,
we can swap it with the empty front finger, and change it to the former case.
After that, we need examine the node we extracted from the front finger is leaf node
(How to do that? this is left as an exercise to the reader). If not, we need go on extracting
the first sub node from the children of this node, and left the rest of other children as the
new front finger to the parent of the current tree. We need repeatedly go up along with
the parent field till the node we extracted is a leaf. At that time point, we arrive at the
root of the tree. Figure 12.12 illustrates this process.
Based on this idea, the following algorithm realizes the removal operation on head.
The algorithm assumes that the tree passed in isn’t empty.
function Extract-Head(T )
r ← Tree()
Connect-Mid(r, T )
while Front(T ) = ϕ∧ Mid(T ) 6= NIL do
T ← Mid(T )
if Front(T ) = ϕ∧ Rear(T ) 6= ϕ then
Exchange Front(T ) ↔ Rear(T )
n ← Node()
Children(n) ← Front(T )
repeat
L ← Children(n) ▷ L = {n1 , n2 , n3 , ...}
n ← L[1] ▷ n ← n1
Front(T ) ← L[2..] ▷ L[2..] = {n2 , n3 , ...}
T ← Parent(T )
296 CHAPTER 12. SEQUENCES, THE LAST BRICK

1 x[1] is extracted 1

[] 2 r[1][1] r[1][2] ... x[2] x[3] ... 2 r[1][1] r[1][2] ...

[] 3 r[2][1] r[2][2] ... n[2][2] n[2][3] ... 3 r[2][1] r[2][2] ...

... ...

i-1 i-1

children of n[i][1]= n[i-1][1] n[i-1][2] ... r[i-1][1] r[i-1][2] ... n[i-1][2] n[i-1][3] ... r[i-1][1] r[i-1][2] ...

i i

n[i][2] ... r[i][1] r[i][2] ... n[i][2] ... r[i][1] r[i][2] ...

... ...

(a) Extract the first element n[i][1] and put its (b) Repeat this process i times, and finally
children to the front finger of upper level tree. x[1] is extracted.

Figure 12.12: Traverse bottom-up till a leaf is extracted.

if Mid(T ) becomes empty then

Mid(T ) ← NIL
until n is a leaf
return (Elem(n), Flat(r))
Note that function Elem(n) returns the only element stored inside leaf node n. Similar
as imperative insertion algorithm, a stub ‘ground’ tree is used as the parent of the root,
which can simplify the logic a bit. That’s why we need flatten the tree finally.
Below Python program translates the algorithm.
def extract_head(t):
root = Tree()
root.set_mid(t)
while [Link] == [] and [Link] is not None:
t = [Link]
if [Link] == [] and [Link] ̸= []:
([Link], [Link]) = ([Link], [Link])
n = wraps([Link])
while True: # a repeat-until loop
ns = [Link]
n = ns[0]
[Link] = ns[1:]
t = [Link]
if [Link]():
[Link] = None
[Link] = None
if [Link]:
break
return (elem(n), flat(root))
12.6. FINGER TREE 297

Member function [Link]() returns true if both the front finger and the rear
finger are empty. We put a flag [Link] to mark if a node is a leaf or compound
node. The exercise of this section asks the reader to consider some alternatives.
As the ill-formed tree is allowed, the algorithms to access the first and last element of
the finger tree must be modified, so that they don’t blindly return the first or last child
of the finger as the finger can be empty if the tree is ill-formed.
The idea is quite similar to the Extract-Head, that in case the finger is empty while
the middle part inner tree isn’t, we need traverse along with the inner tree till a point that
either the finger becomes non-empty or all the nodes are stored in the other finger. For
instance, the following algorithm can return the first leaf node even the tree is ill-formed.
function First-Lf(T )
while Front(T ) = ϕ∧ Mid(T ) 6= NIL do
T ← Mid(T )
if Front(T ) = ϕ∧ Rear(T ) 6= ϕ then
n ← Rear(T )[1]
else
n ← Front(T )[1]
while n is NOT leaf do
n ← Children(n)[1]
return n
Note the second loop in this algorithm that it keeps traversing on the first sub-node if
current node isn’t a leaf. So we always get a leaf node and it’s trivial to get the element
inside it.
function First(T )
return Elem(First-Lf(T ))
The following Python code translates the algorithm to real program.
def first(t):
return elem(first_leaf(t))

def first_leaf(t):
while [Link] == [] and [Link] is not None:
t = [Link]
if [Link] == [] and [Link] ̸= []:
n = [Link][0]
else:
n = [Link][0]
while not [Link]:
n = [Link][0]
return n

To access the last element is quite similar, and we left it as an exercise to the reader.

12.6.5 append element to the tail of the sequence

Because finger tree is symmetric, we can give the realization of appending element on tail
by referencing to insertT algorithm.


 leaf (x) : T = ϕ

tree({y}, ϕ, {x}) : T = leaf (y)
appendT (T, x) =

 tree(F, appendT (M, tr3(x1 , x2 , x3 )), {x4 , x}) : T = tree(F, M, {x1 , x2 , x3 , x4 })

tree(F, M, R ∪ {x}) : otherwise
(12.28)
Generally speaking, if the rear finger is still valid 2-3 tree, that the number of elements
is not greater than 4, the new elements is directly appended to rear finger. Otherwise,
298 CHAPTER 12. SEQUENCES, THE LAST BRICK

we break the rear finger, take the first 3 elements in rear finger to create a new 2-3 tree,
and recursively append it to the middle part inner tree. If the finger tree is empty or a
singleton leaf, it will be handled in the first two cases.
Translating the equation to Haskell yields the below program.

snoc :: Tree a → a → Tree a

snoc Empty a = Lf a
snoc (Lf a) b = Tr [a] Empty [b]
snoc (Tr f m [a, b, c, d]) e = Tr f (snoc m (Br3 a b c)) [d, e]
snoc (Tr f m r) a = Tr f m (r++[a])

Function name snoc is mirror of cons, which indicates the symmetric relationship.
Appending new element to the end imperatively is quite similar. The following algo-
rithm realizes appending.
function Append-Node(T, n)
r ← Tree()
p←r
Connect-Mid(p, T )
while Full?(Rear(T )) do
R ← Rear(T ) ▷ R = {n1 , n2 , ..., , nm−1 , nm }
Rear(T ) ← {n, Last(R) } ▷ last element nm
n ← Node()
Children(n) ← R[1...m − 1] ▷ {n1, n2, ..., nm−1 }
p←T
T ← Mid(T )
if T = NIL then
T ← Tree()
Front(T ) ← {n}
else if | Rear(T ) | = 1 ∧ Front(T ) = ϕ then
Front(T ) ← Rear(T )
Rear(T ) ← {n}
else
Rear(T ) ← Rear(T ) ∪{n}
Connect-Mid(p, T ) ← T
return Flat(r)
And the corresponding Python programs is given as below.

def append_node(t, n):

root = prev = Tree()
prev.set_mid(t)
while rearFull(t):
r = [Link]
[Link] = r[-1:] + [n]
n = wraps(r[:-1])
prev = t
t = [Link]
if t is None:
t = leaf(n)
elif len([Link]) == 1 and [Link] == []:
t = Tree([Link], None, [n])
else:
t = Tree([Link], [Link], [Link] + [n])
prev.set_mid(t)
return flat(root)
12.6. FINGER TREE 299

12.6.6 remove element from the tail of the sequence

Similar to appendT , we can realize the algorithm which remove the last element from
finger tree in symmetric manner of extractT .
We denote the non-empty, non-leaf finger tree as tree(F, M, R), where F is the front
finger, M is the middle part inner tree, and R is the rear finger.


 (ϕ, x) : T = leaf (x)


 (leaf (y), x) : T = tree({y}, ϕ, {x})
removeT (T ) = (tree(init(F ), ϕ, last(F )), x) : T = tree(F, ϕ, {x}) ∧ F 6= ϕ



 (tree(F, M ′ , toList(R′ )), x) : T = tree(F, M, {x}), (M ′ , R′ ) = removeT (M )

(tree(F, M, init(R)), last(R)) : otherwise
(12.29)
Function toList(T ) is used to flatten a 2-3 tree to plain list, which is defined pre-
viously. Function init(L) returns all elements except for the last one in list L, that if
L = {a1 , a2 , ..., an−1 , an }, init(L) = {a1 , a2 , ..., an−1 }. And Function last(L) returns the
last element, so that last(L) = an . Please refer to the appendix of this book for their
implementation.
Algorithm removeT () can be translated to the following Haskell program, we name it
as unsnoc to indicate it’s the reverse function of snoc.
unsnoc :: Tree a → (Tree a, a)
unsnoc (Lf a) = (Empty, a)
unsnoc (Tr [a] Empty [b]) = (Lf a, b)
unsnoc (Tr f@(_:_) Empty [a]) = (Tr (init f) Empty [last f], a)
unsnoc (Tr f m [a]) = (Tr f m' (nodeToList r), a) where (m', r) = unsnoc m
unsnoc (Tr f m r) = (Tr f m (init r), last r)

And we can define a special function last and init for finger tree which is similar
to their counterpart for list.
last = snd ◦ unsnoc
init = fst ◦ unsnoc

Imperatively removing the element from the end is almost as same as removing on
the head. Although there seems to be a special case, that as we always store the only
element (or sub node) in the front finger while the rear finger and middle part inner tree
are empty (e.g. T ree({n}, N IL, ϕ)), it might get nothing if always try to fetch the last
element from rear finger.
This can be solved by swapping the front and the rear finger if the rear is empty as
in the following algorithm.
function Extract-Tail(T )
r ← Tree()
Connect-Mid(r, T )
while Rear(T ) = ϕ∧ Mid(T ) 6= NIL do
T ← Mid(T )
if Rear(T ) = ϕ∧ Front(T ) 6= ϕ then
Exchange Front(T ) ↔ Rear(T )
n ← Node()
Children(n) ← Rear(T )
repeat
L ← Children(n) ▷ L = {n1 , n2 , ..., nm−1 , nm }
n ← Last(L) ▷ n ← nm
Rear(T ) ← L[1...m − 1] ▷ {n1 , n2 , ..., nm−1 }
T ← Parent(T )
300 CHAPTER 12. SEQUENCES, THE LAST BRICK

if Mid(T ) becomes empty then

Mid(T ) ← NIL
until n is a leaf
return (Elem(n), Flat(r))
How to access the last element as well as implement this algorithm to working program
are left as exercises.

12.6.7 concatenate
Consider the none-trivial case that concatenate two finger trees T1 = tree(F1 , M1 , R1 )
and T2 = tree(F2 , M2 , R2 ). One natural idea is to use F1 as the new front finger for the
concatenated result, and keep R2 being the new rear finger. The rest of work is to merge
M1 , R1 , F2 and M2 to a new middle part inner tree.
Note that both R1 and F2 are plain lists of node, so the sub-problem is to realize a
algorithm like this.
merge(M1 , R1 ∪ F2 , M2 ) =?
More observation reveals that both M1 and M2 are also finger trees, except that they
are one level deeper than T1 and T2 in terms of N ode(a), where a is the type of element
stored in the tree. We can recursively use the strategy that keep the front finger of M1
and the rear finger of M2 , then merge the middle part inner tree of M1 , M2 , as well as
the rear finger of M1 and front finger of M2 .
If we denote function f ront(T ) returns the front finger, rear(T ) returns the rear finger,
mid(T ) returns the middle part inner tree. the above merge algorithm can be expressed
for non-trivial case as the following.
merge(M1 , R1 ∪ F2 , M2 ) = tree(f ront(M1 ), S, rear(M2 ))
(12.30)
S = merge(mid(M1 ), rear(M1 ) ∪ R1 ∪ F2 ∪ f ront(M2 ), mid(M2 ))
If we look back to the original concatenate solution, it can be expressed as below.
concat(T1 , T2 ) = tree(F1 , merge(M1 , R1 ∪ F2 , M2 ), R2 ) (12.31)
And compare it with equation 12.30, it’s easy to note the fact that concatenating is
essentially merging. So we have the final algorithm like this.
concat(T1 , T2 ) = merge(T1 , ϕ, T2 ) (12.32)
By adding edge cases, the merge() algorithm can be completed as below.


 f oldR(insertT, T2 , S) : T1 = ϕ


 f oldL(appendT, T1 , S) : T2 = ϕ
merge(T1 , S, T2 ) = merge(ϕ, {x} ∪ S, T2 ) : T1 = leaf (x)



 merge(T 1 , S ∪ {x}, ϕ) : T2 = leaf (x)

tree(F1 , merge(M1 , nodes(R1 ∪ S ∪ F2 ), M 2), R2 ) : otherwise
(12.33)
Most of these cases are straightforward. If any one of T1 or T2 is empty, the algorithm
repeatedly insert/append all elements in S to the other tree; Function f oldL and f oldR
are kinds of for-each process in imperative settings. The difference is that f oldL processes
the list S from left to right while f oldR processes from right to left.
Here are their definition. Suppose list L = {a1 , a2 , ..., an−1 , an }, L′ = {a2 , a3 , ..., an−1 , an }
is the rest of elements except for the first one.
{
e : L=ϕ
f oldL(f, e, L) = (12.34)
f oldL(f, f (e, a1 ), L′ ) : otherwise
12.6. FINGER TREE 301
{
e : L=ϕ
f oldR(f, e, L) = (12.35)
f (a1 , f oldR(f, e, L′ )) : otherwise

They are detailed explained in the appendix of this book.

If either one of the tree is a leaf, we can insert or append the element of this leaf to
S, so that it becomes the trivial case of concatenating one empty tree with another.
Function nodes is used to wrap a list of elements to a list of 2-3 trees. This is because
the contents of middle part inner tree, compare to the contents of finger, are one level
deeper in terms of N ode. Consider the time point that transforms from recursive case to
edge case. Let’s suppose M1 is empty at that time, we then need repeatedly insert all
elements from R1 ∪ S ∪ F2 to M2 . However, we can’t directly do the insertion. If the
element type is a, we can only insert N ode(a) which is 2-3 tree to M2 . This is just like
what we did in the insertT algorithm, take out the last 3 elements, wrap them in a 2-3
tree, and recursive perform insertT . Here is the definition of nodes.


 {tr2(x1 , x2 )} : L = {x1 , x2 }

{tr3(x1 , x2 , x3 )} : L = {x1 , x2 , x3 }
nodes(L) = (12.36)

 {tr2(x1 , x2 ), tr2(x3 , x4 )} : L = {x1 , x2 , x3 , x4 }

{tr3(x1 , x2 , x3 )} ∪ nodes({x4 , x5 , ...}) : otherwise

Function nodes follows the constraint of 2-3 tree, that if there are only 2 or 3 elements
in the list, it just wrap them in singleton list contains a 2-3 tree; If there are 4 elements
in the lists, it split them into two trees each is consist of 2 branches; Otherwise, if there
are more elements than 4, it wraps the first three in to one tree with 3 branches, and
recursively call nodes to process the rest.
The performance of concatenation is determined by merging. Analyze the recursive
case of merging reveals that the depth of recursion is proportion to the smaller height of
the two trees. As the tree is ensured to be balanced by using 2-3 tree. it’s height is bound
to O(lg n′ ) where n′ is the number of elements. The edge case of merging performs as
same as insertion, (It calls insertT at most 8 times) which is amortized O(1) time, and
O(lg m) at worst case, where m is the difference in height of the two trees. So the overall
performance is bound to O(lg n), where n is the total number of elements contains in two
finger trees.
The following Haskell program implements the concatenation algorithm.
concat :: Tree a → Tree a → Tree a
concat t1 t2 = merge t1 [] t2

Note that there is concat function defined in prelude standard library, so we need
distinct them either by hiding import or take a different name.
merge :: Tree a → [a] → Tree a → Tree a
merge Empty ts t2 = foldr cons t2 ts
merge t1 ts Empty = foldl snoc t1 ts
merge (Lf a) ts t2 = merge Empty (a:ts) t2
merge t1 ts (Lf a) = merge t1 (ts++[a]) Empty
merge (Tr f1 m1 r1) ts (Tr f2 m2 r2) = Tr f1 (merge m1 (nodes (r1 ++ ts ++ f2)) m2) r2

And the implementation of nodes is as below.

nodes :: [a] → [Node a]
nodes [a, b] = [Br2 a b]
nodes [a, b, c] = [Br3 a b c]
nodes [a, b, c, d] = [Br2 a b, Br2 c d]
nodes ([Link]xs) = Br3 a b c:nodes xs

To concatenate two finger trees T1 and T2 in imperative approach, we can traverse

the two trees along with the middle part inner tree till either tree turns to be empty. In
302 CHAPTER 12. SEQUENCES, THE LAST BRICK

every iteration, we create a new tree T , choose the front finger of T1 as the front finger
of T ; and choose the rear finger of T2 as the rear finger of T . The other two fingers (rear
finger of T1 and front finger of T2 ) are put together as a list, and this list is then balanced
grouped to several 2-3 tree nodes as N . Note that N grows along with traversing not
only in terms of length, the depth of its elements increases by one in each iteration. We
attach this new tree as the middle part inner tree of the upper level result tree to end
this iteration.
Once either tree becomes empty, we stop traversing, and repeatedly insert the 2-3 tree
nodes in N to the other non-empty tree, and set it as the new middle part inner tree of
the upper level result.
Below algorithm describes this process in detail.
function Concat(T1 , T2 )
return Merge(T1 , ϕ, T2 )

function Merge(T1 , N, T2 )
r ← Tree()
p←r
while T1 6= NIL ∧T2 6= NIL do
T ← Tree()
Front(T ) ← Front(T1 )
Rear(T ) ← Rear(T2 )
Connect-Mid(p, T )
p←T
N ← Nodes(Rear(T1 ) ∪n∪ Front(T2 ))
T1 ← Mid(T1 )
T2 ← Mid(T2 )
if T1 = NIL then
T ← T2
for each n ∈ Reverse(N ) do
T ← Prepend-Node(n, T )
else if T2 = NIL then
T ← T1
for each n ∈ N do
T ← Append-Node(T, n)
Connect-Mid(p, T )
return Flat(r)
Note that the for-each loops in the algorithm can also be replaced by folding from left
and right respectively. Translating this algorithm to Python program yields the below
code.
def concat(t1, t2):
return merge(t1, [], t2)

def merge(t1, ns, t2):

root = prev = Tree() #sentinel dummy tree
while t1 is not None and t2 is not None:
t = Tree([Link] + [Link] + sizeNs(ns), [Link], None, [Link])
prev.set_mid(t)
prev = t
ns = nodes([Link] + ns + [Link])
t1 = [Link]
t2 = [Link]
if t1 is None:
prev.set_mid(foldR(prepend_node, ns, t2))
12.6. FINGER TREE 303

elif t2 is None:
prev.set_mid(reduce(append_node, ns, t1))
return flat(root)

Because Python only provides folding function from left as reduce(), a folding func-
tion from right is given like what we shown in the following code, that it repeatedly applies
function in reverse order of the list.
def foldR(f, xs, z):
for x in reversed(xs):
z = f(x, z)
return z

The only function in question is how to balanced-group nodes to bigger 2-3 trees. As
a 2-3 tree can hold at most 3 sub trees, we can firstly take 3 nodes and wrap them to
a ternary tree if there are more than 4 nodes in the list and continuously deal with the
rest. If there are just 4 nodes, they can be wrapped to two binary trees. For other cases
(there are 3 nodes, 2 nodes, 1 node), we simply wrap them all to a tree.
Denote node list L = {n1 , n2 , ...}, The following algorithm realizes this process.
function Nodes(L)
N =ϕ
while |L| > 4 do
n ← Node()
Children(n) ← L[1..3] ▷ {n1 , n2 , n3 }
N ← N ∪ {n}
L ← L[4...] ▷ {n4 , n5 , ...}
if |L| = 4 then
x ← Node()
Children(x) ← {L[1], L[2]}
y ← Node()
Children(y) ← {L[3], L[4]}
N ← N ∪ {x, y}
else if L 6= ϕ then
n ← Node()
Children(n) ← L
N ← N ∪ {n}
return N
It’s straight forward to translate the algorithm to below Python program. Where
function wraps() helps to create an empty node, then set a list as the children of this
node.
def nodes(xs):
res = []
while len(xs) > 4:
[Link](wraps(xs[:3]))
xs = xs[3:]
if len(xs) == 4:
[Link](wraps(xs[:2]))
[Link](wraps(xs[2:]))
elif xs ̸= []:
[Link](wraps(xs))
return res

Exercise 12.5
304 CHAPTER 12. SEQUENCES, THE LAST BRICK

1. Implement the complete finger tree insertion program in your favorite imperative
programming language. Don’t check the example programs along with this chapter
before having a try.

2. How to determine a node is a leaf? Does it contain only a raw element inside
or a compound node, which contains sub nodes as children? Note that we can’t
distinguish it by testing the size, as there is case that node contains a singleton leaf,
such as node(1, {node(1, {x}}). Try to solve this problem in both dynamic typed
language (e.g. Python, lisp etc) and in strong static typed language (e.g. C++).

3. Implement the Extract-Tail algorithm in your favorite imperative programming

language.

4. Realize algorithm to return the last element of a finger tree in both functional and
imperative approach. The later one should be able to handle ill-formed tree.

5. Try to implement concatenation algorithm without using folding. You can either
use recursive methods, or use imperative for-each method.

12.6.8 Random access of finger tree

size augmentation
The strategy to provide fast random access, is to turn the looking up into tree-search.
In order to avoid calculating the size of tree many times, we augment an extra field to
tree and node. The definition should be modified accordingly, for example the following
Haskell definition adds size field in its constructor.
data Tree a = Empty
| Lf a
| Tr Int [a] (Tree (Node a)) [a]

And the previous ANSI C structure is augmented with size as well.

struct Tree {
union Node∗ front;
union Node∗ rear;
Tree∗ mid;
Tree∗ parent;
int size;
};

Suppose the function tree(s, F, M, R) creates a finger tree from size s, front finger F ,
rear finger R, and middle part inner tree M . When the size of the tree is needed, we can
call a size(T ) function. It will be something like this.

 0 : T =ϕ
size(T ) = ? : T = leaf (x)

s : T = tree(s, F, M, R)

If the tree is empty, the size is definitely zero; and if it can be expressed as tree(s, F, M, R),
the size is s; however, what if the tree is a singleton leaf? is it 1? No, it can be 1 only
if T = leaf (a) and a isn’t a tree node, but a raw element stored in finger tree. In most
cases, the size is not 1, because a can be again a tree node. That’s why we put a ‘?’ in
above equation.
12.6. FINGER TREE 305

The correct way is to call some size function on the tree node as the following.

 0 : T =ϕ
size(T ) = size′ (x) : T = leaf (x) (12.37)

s : T = tree(s, F, M, R)

Note that this isn’t a recursive definition since size 6= size′ , the argument to size′ is
either a tree node, which is a 2-3 tree, or a plain element stored in the finger tree. To
uniform these two cases, we can anyway wrap the single plain element to a tree node of
only one element. So that we can express all the situation as a tree node augmented with
a size field. The following Haskell program modifies the definition of tree node.
data Node a = Br Int [a]

The ANSI C node definition is modified accordingly.

struct Node {
Key key;
struct Node∗ children;
int size;
};

We change it from union to structure. Although there is a overhead field ‘key’ if the
node isn’t a leaf.
Suppose function tr(s, L), creates such a node (either one element being wrapped or
a 2-3 tree) from a size information s, and a list L. Here are some example.

tr(1, {x}) a tree contains only one element

tr(2, {x, y}) a 2-3 tree contains two elements
tr(3, {x, y, z}) a 2-3 tree contains three elements

So the function size′ can be implemented as returning the size information of a tree
node. We have size′ (tr(s, L)) = s.
Wrapping an element x is just calling tr(1, {x}). We can define auxiliary functions
wrap and unwrap, for instance.

wrap(x) = tr(1, {x})

(12.38)
unwrap(n) = x : n = tr(1, {x})

As both front finger and rear finger are lists of tree nodes, in order to calculate the
total size of finger, we can provide a size′′ (L) function, which sums up size of all nodes
stored in the list. Denote L = {a1 , a2 , ...} and L′ = {a2 , a3 , ...}.
{
0 : L=ϕ
size′′ (L) = (12.39)
size′ (a1 ) + size′′ (L′ ) : otherwise

It’s quite OK to define size′′ (L) by using some high order functions. For example.

size′′ (L) = sum(map(size′ , L)) (12.40)

And we can turn a list of tree nodes into one deeper 2-3 tree and vice-versa.

wraps(L) = tr(size′′ (L), L)

(12.41)
unwraps(n) = L : n = tr(s, L)

These helper functions are translated to the following Haskell code.

306 CHAPTER 12. SEQUENCES, THE LAST BRICK

size (Br s _) = s

sizeL = sum .(map size)

sizeT Empty = 0
sizeT (Lf a) = size a
sizeT (Tr s _ _ _) = s

Here are the wrap and unwrap auxiliary functions.

wrap x = Br 1 [x]
unwrap (Br 1 [x]) = x
wraps xs = Br (sizeL xs) xs
unwraps (Br _ xs) = xs

We omitted their type definitions for illustration purpose.

In imperative settings, the size information for node and tree can be accessed through
the size field. And the size of a list of nodes can be summed up for this field as the below
algorithm.
function Size-Nodes(L)
s←0
for ∀n ∈ L do
s ← s+ Size(n)
return s
The following Python code, for example, translates this algorithm by using standard
sum() and map() functions provided in library.
def sizeNs(xs):
return sum(map(lambda x: [Link], xs))

As NIL is typically used to represent empty tree in imperative settings, it’s convenient
to provide a auxiliary size function to uniformed calculate the size of tree no matter it is
NIL.
function Size-Tr(T )
if T = NIL then
return 0
else
return Size(T )
The algorithm is trivial and we skip its implementation example program.

Modification due to the augmented size

The algorithms have been presented so far need to be modified to accomplish with the
augmented size. For example the insertT function now inserts a tree node instead of a
plain element.

insertT (x, T ) = insertT ′ (wrap(x), T ) (12.42)

The corresponding Haskell program is changed as below.

cons a t = cons' (wrap a) t

After being wrapped, x is augmented with size information of 1. In the implementation

of previous insertion algorithm, function tree(F, M, R) is used to create a finger tree from
12.6. FINGER TREE 307

a front finger, a middle part inner tree and a rear finger. This function should also be
modified to add size information of these three arguments.


 f romL(F ) : M = ϕ ∧ R = ϕ


 f romL(R) : M = ϕ ∧ F = ϕ
tree′ (F, M, R) = tree′ (unwraps(F ′ ), M ′ , R) : F = ϕ, (F ′ , M ′ ) = extractT ′ (M



 tree′ (F, M ′ , unwraps(R′ )) : R = ϕ, (M ′ , R′ ) = removeT ′ (M

tree(size (F ) + size(M ) + size′′ (R), F, M, R) : otherwise
′′

(12.43)
Where f romL() helps to turn a list of nodes to a finger tree by repeatedly inserting
all the element one by one to an empty tree.

f romL(L) = f oldR(insertT ′ , ϕ, L)

Of course it can be implemented in pure recursive manner without using folding as

well.
The last case is the most straightforward one. If none of F , M , and R is empty, it
adds the size of these three part and construct the tree along with this size information
by calling tree(s, F, M, R) function. If both the middle part inner tree and one of the
finger is empty, the algorithm repeatedly insert all elements stored in the other finger to
an empty tree, so that the result is constructed from a list of tree nodes. If the middle
part inner tree isn’t empty, and one of the finger is empty, the algorithm ‘borrows’ one
tree node from the middle part, either by extracting from head if front finger is empty
or removing from tail if rear finger is empty. Then the algorithm unwraps the ‘borrowed’
tree node to a list, and recursively call tree′ () function to construct the result.
This algorithm can be translated to the following Haskell code for example.
tree f Empty [] = foldr cons' Empty f
tree [] Empty r = foldr cons' Empty r
tree [] m r = let (f, m') = uncons' m in tree (unwraps f) m' r
tree f m [] = let (m', r) = unsnoc' m in tree f m' (unwraps r)
tree f m r = Tr (sizeL f + sizeT m + sizeL r) f m r

Function tree′ () helps to minimize the modification. insertT ′ () can be realized by

using it like the following.


 leaf (x) : T = ϕ
 ′
tree ({x}, ϕ, {y}) : T = leaf (x)
insertT ′ (x, T ) = ′ ′

 tree ({x, x 1 }, insertT (wraps({x 2 , x 3 , x 4 }), M ), R) : T = tree(s, {x1 , x2 , x3 , x4 }

tree′ ({x} ∪ F, M, R) : otherwise
(12.44)
And it’s corresponding Haskell code is a line by line translation.
cons' a Empty = Lf a
cons' a (Lf b) = tree [a] Empty [b]
cons' a (Tr _ [b, c, d, e] m r) = tree [a, b] (cons' (wraps [c, d, e]) m) r
cons' a (Tr _ f m r) = tree (a:f) m r

The similar modification for augment size should also be tuned for imperative algo-
rithms, for example, when a new node is prepend to the head of the finger tree, we should
update size when traverse the tree.
function Prepend-Node(n, T )
r ← Tree()
p←r
Connect-Mid(p, T )
while Full?(Front(T )) do
308 CHAPTER 12. SEQUENCES, THE LAST BRICK

F ← Front(T )
Front(T ) ← {n, F [1]}
Size(T ) ← Size(T ) + Size(n) ▷ update size
n ← Node()
Children(n) ← F [2..]
p←T
T ← Mid(T )
if T = NIL then
T ← Tree()
Front(T )← {n}
else if | Front(T ) | = 1 ∧ Rear(T ) = ϕ then
Rear(T ) ← Front(T )
Front(T ) ← {n}
else
Front(T ) ← {n}∪ Front(T )
Size(T ) ← Size(T ) + Size(n) ▷ update size
Connect-Mid(p, T ) ← T
return Flat(r)
The corresponding Python code are modified accordingly as below.
def prepend_node(n, t):
root = prev = Tree()
prev.set_mid(t)
while frontFull(t):
f = [Link]
[Link] = [n] + f[:1]
[Link] = [Link] + [Link]
n = wraps(f[1:])
prev = t
t = [Link]
if t is None:
t = leaf(n)
elif len([Link])==1 and [Link] == []:
t = Tree([Link] + [Link], [n], None, [Link])
else:
t = Tree([Link] + [Link], [n]+[Link], [Link], [Link])
prev.set_mid(t)
return flat(root)

Note that the tree constructor is also modified to take a size argument as the first
parameter. And the leaf helper function does not only construct the tree from a node,
but also set the size of the tree with the same size of the node inside it.
For simplification purpose, we skip the detailed description of what are modified in
extractT , appendT , removeT , and concat algorithms. They are left as exercises to the
reader.

Split a finger tree at a given position

With size information augmented, it’s easy to locate a node at given position by perform-
ing a tree search. What’s more, as the finger tree is constructed from three part F , M ,
and R; and it’s nature of recursive, it’s also possible to split it into three sub parts with
a given position i: the left, the node at i, and the right part.
The idea is straight forward. Since we have the size information for F , M , and R.
Denote these three sizes as Sf , Sm , and Sr . if the given position i ≤ Sf , the node must be
stored in F , we can go on seeking the node inside F ; if Sf < i ≤ Sf + Sm , the node must
12.6. FINGER TREE 309

be stored in M , we need recursively perform search in M ; otherwise, the node should be

in R, we need search inside R.
If we skip the error handling of trying to split an empty tree, there is only one edge
case as below.
{
(ϕ, x, ϕ) : T = leaf (x)
splitAt(i, T ) =
... : otherwise

Splitting a leaf results both the left and right parts empty, the node stored in leaf is
the resulting node.
The recursive case handles the three sub cases by comparing i with the sizes. Suppose
function splitAtL(i, L) splits a list of nodes at given position i into three parts: (A, x, B) =
splitAtL(i, L), where x is the i-th node in L, A is a sub list contains all nodes before
position i, and B is a sub list contains all rest nodes after i.


 (ϕ, x, ϕ) : T = leaf (x)

(f romL(A), x, tree′ (B, M, R) : i ≤ Sf , (A, x, B) = splitAtL(i, F )
splitAt(i, T ) =

 (tree′ (F, Ml , A), x, tree′ (B, Mr , R) : Sf < i ≤ Sf + Sm

(tree′ (F, M, A), x, f romL(B)) : otherwise, (A, x, B) = splitAtL(i − Sf − Sm
(12.45)
Where Ml , x, Mr , A, B in the third case are calculated as the following.

(Ml , t, Mr ) = splitAt(i − Sf , M )
(A, x, B) = splitAtL(i − Sf − size(Ml ), unwraps(t))

And the function splitAtL is just a linear traverse, since the length of list is limited
not to exceed the constraint of 2-3 tree, the performance is still ensured to be constant
O(1) time. Denote L = {x1 , x2 , ...} and L′ = {x2 , x3 , ...}.

 (ϕ, x1 , ϕ) : i = 0 ∧ L = {x1 }
splitAtL(i, L) = (ϕ, x1 , L′ ) : i < size′ (x1 ) (12.46)

({x1 } ∪ A, x, B) : otherwise

Where

(A, x, B) = splitAtL(i − size′ (x1 ), L′ )

The solution of splitting is a typical divide and conquer strategy. The performance
of this algorithm is determined by the recursive case of searching in middle part inner
tree. Other cases are all constant time as we’ve analyzed. The depth of recursion is
proportion to the height of the tree h, so the algorithm is bound to O(h). Because the
tree is well balanced (by using 2-3 tree, and all the insertion/removal algorithms keep the
tree balanced), so h = O(lg n) where n is the number of elements stored in finger tree.
The overall performance of splitting is O(lg n).
Let’s first give the Haskell program for splitAtL function
splitNodesAt 0 [x] = ([], x, [])
splitNodesAt i (x:xs) | i < size x = ([], x, xs)
| otherwise = let (xs', y, ys) = splitNodesAt (i-size x) xs
in (x:xs', y, ys)

Then the program for splitAt, as there is already function defined in standard library
with this name, we slightly change the name by adding a apostrophe.
splitAt' _ (Lf x) = (Empty, x, Empty)
splitAt' i (Tr _ f m r)
| i < szf = let (xs, y, ys) = splitNodesAt i f
310 CHAPTER 12. SEQUENCES, THE LAST BRICK

in ((foldr cons' Empty xs), y, tree ys m r)

| i < szf + szm = let (m1, t, m2) = splitAt' (i-szf) m
(xs, y, ys) = splitNodesAt (i-szf - sizeT m1) (unwraps t)
in (tree f m1 xs, y, tree ys m2 r)
| otherwise = let (xs, y, ys) = splitNodesAt (i-szf -szm) r
in (tree f m xs, y, foldr cons' Empty ys)
where
szf = sizeL f
szm = sizeT m

Random access
With the help of splitting at any arbitrary position, it’s trivial to realize random access
in O(lg n) time. Denote function mid(x) returns the 2-nd element of a tuple, lef t(x), and
right(x) return the first element and the 3-rd element of the tuple respectively.

getAt(S, i) = unwrap(mid(splitAt(i, S))) (12.47)

It first splits the sequence at position i, then unwraps the node to get the element
stored inside it. When mutate the i-th element of sequence S represented by finger tree, we
first split it at i, then we replace the middle to what we want to mutate, and re-construct
them to one finger tree by using concatenation.

setAt(S, i, x) = concat(L, insertT (x, R)) (12.48)

where
(L, y, R) = splitAt(i, S)
What’s more, we can also realize a removeAt(S, i) function, which can remove the i-th
element from sequence S. The idea is first to split at i, unwrap and return the element
of the i-th node; then concatenate the left and right to a new finger tree.

removeAt(S, i) = (unwrap(y), concat(L, R)) (12.49)

These handy algorithms can be translated to the following Haskell program.

getAt t i = unwrap x where (_, x, _) = splitAt' i t
setAt t i x = let (l, _, r) = splitAt' i t in concat' l (cons x r)
removeAt t i = let (l, x, r) = splitAt' i t in (unwrap x, concat' l r)

Imperative random access

As we can directly mutate the tree in imperative settings, it’s possible to realize Get-
At(T, i) and Set-At(T, i, x) without using splitting. The idea is firstly implement a
algorithm which can apply some operation to a given position. The following algorithm
takes three arguments, a finger tree T , a position index at i which ranges from zero to
the number of elements stored in the tree, and a function f , which will be applied to the
element at i.
function Apply-At(T, i, f )
while Size(T ) > 1 do
Sf ← Size-Nodes(Front(T ))
Sm ← Size-Tr(Mid(T ))
if i < Sf then
return Lookup-Nodes(Front(T ), i, f )
else if i < Sf + Sm then
T ← Mid(T )
12.6. FINGER TREE 311

i ← i − Sf
else
return Lookup-Nodes(Rear(T ), i − Sf − Sm , f )
n ← First-Lf(T )
x ← Elem(n)
Elem(n) ← f (x)
return x
This algorithm is essentially a divide and conquer tree search. It repeatedly examine
the current tree till reach a tree with size of 1 (can it be determined as a leaf? please
consider the ill-formed case and refer to the exercise later). Every time, it checks the
position to be located with the size information of front finger and middle part inner tree.
If the index i is less than the size of front finger, the location is at some node in it. The
algorithm call a sub procedure to look-up among front finger; If the index is between the
size of front finger and the total size till middle part inner tree, it means that the location
is at some node inside the middle, the algorithm goes on traverse along the middle part
inner tree with an updated index reduced by the size of front finger; Otherwise it means
the location is at some node in rear finger, the similar looking up procedure is called
accordingly.
After this loop, we’ve got a node, (can be a compound node) with what we are looking
for at the first leaf inside this node. We can extract the element out, and apply the function
f on it and store the new value back.
The algorithm returns the previous element before applying f as the final result.
What hasn’t been factored is the algorithm Lookup-Nodes(L, i, f ). It takes a list of
nodes, a position index, and a function to be applied. This algorithm can be implemented
by checking every node in the list. If the node is a leaf, and the index is zero, we are at
the right position to be looked up. The function can be applied on the element stored in
this leaf, and the previous value is returned; Otherwise, we need compare the size of this
node and the index to determine if the position is inside this node and search inside the
children of the node if necessary.
function Lookup-Nodes(L, i, f )
loop
for ∀n ∈ L do
if n is leaf ∧i = 0 then
x ← Elem(n)
Elem(n) ← f (x)
return x
if i < Size(n) then
L ← Children(n)
break
i ← i− Size(n)
The following are the corresponding Python code implements the algorithms.
def applyAt(t, i, f):
while [Link] > 1:
szf = sizeNs([Link])
szm = sizeT([Link])
if i < szf:
return lookupNs([Link], i, f)
elif i < szf + szm:
t = [Link]
i = i - szf
else:
return lookupNs([Link], i - szf - szm, f)
n = first_leaf(t)
312 CHAPTER 12. SEQUENCES, THE LAST BRICK

x = elem(n)
[Link][0] = f(x)
return x

def lookupNs(ns, i, f):

while True:
for n in ns:
if [Link] and i == 0:
x = elem(n)
[Link][0] = f(x)
return x
if i < [Link]:
ns = [Link]
break
i = i - [Link]

With auxiliary algorithm that can apply function at a given position, it’s trivial to
implement the Get-At and Set-At by passing special functions for applying.
function Get-At(T, i)
return Apply-At(T, i, λx .x)

function Set-At(T, i, x)
return Apply-At(T, i, λy .x)
That is we pass id function to implement getting element at a position, which doesn’t
change anything at all; and pass constant function to implement setting, which set the
element to new value by ignoring its previous value.

Imperative splitting
It’s not enough to just realizing Apply-At algorithm in imperative settings, this is be-
cause removing element at arbitrary position is also a typical case.
Almost all the imperative finger tree algorithms so far are kind of one-pass top-down
manner. Although we sometimes need to book keeping the root. It means that we can
even realize all of them without using the parent field.
Splitting operation, however, can be easily implemented by using parent field. We can
first perform a top-down traverse along with the middle part inner tree as long as the
splitting position doesn’t located in front or rear finger. After that, we need a bottom-up
traverse along with the parent field of the two split trees to fill out the necessary fields.
function Split-At(T, i)
T1 ← Tree()
T2 ← Tree()
while Sf ≤ i < Sf + Sm do ▷ Top-down pass
T1′ ← Tree()
T2′ ← Tree()
Front(T1′ ) ← Front(T )
Rear(T2′ ) ← Rear(T )
Connect-Mid(T1 , T1′ )
Connect-Mid(T2 , T2′ )
T1 ← T1′
T2 ← T2′
i ← i − Sf
T ← Mid(T )
if i < Sf then
(X, n, Y ) ← Split-Nodes(Front(T ), i)
12.6. FINGER TREE 313

T1′ ← From-Nodes(X)
T2′ ← T
Size(T2′ ) ← Size(T ) - Size-Nodes(X) - Size(n)
Front(T2′ ) ← Y
else if Sf + Sm ≤ i then
(X, n, Y ) ← Split-Nodes(Rear(T ), i − Sf − Sm )
T2′ ← From-Nodes(Y )
T1′ ← T
Size(T1′ ) ← Size(T ) - Size-Nodes(Y ) - Size(n)
Rear(T1′ ) ← X
Connect-Mid(T1 , T1′ )
Connect-Mid(T2 , T2′ )
i ← i− Size-Tr(T1′ )
while n is NOT leaf do ▷ Bottom-up pass
(X, n, Y ) ← Split-Nodes(Children(n), i)
i ← i− Size-Nodes(X)
Rear(T1 ) ← X
Front(T2 ) ← Y
Size(T1 ) ← Sum-Sizes(T1 )
Size(T2 ) ← Sum-Sizes(T2 )
T1 ← Parent(T1 )
T2 ← Parent(T2 )
return (Flat(T1 ), Elem(n), Flat(T2 ))
The algorithm first creates two trees T1 and T2 to hold the split results. Note that
they are created as ’ground’ trees which are parents of the roots. The first pass is a
top-down pass. Suppose Sf , and Sm retrieve the size of the front finger and the size of
middle part inner tree respectively. If the position at which the tree to be split is located
at middle part inner tree, we reuse the front finger of T for new created T1′ , and reuse
rear finger of T for T2′ . At this time point, we can’t fill the other fields for T1′ and T2′ , they
are left empty, and we’ll finish filling them in the future. After that, we connect T1 and
T1′ so the latter becomes the middle part inner tree of the former. The similar connection
is done for T2 and T2′ as well. Finally, we update the position by deducing it by the size
of front finger, and go on traversing along with the middle part inner tree.
When the first pass finishes, we are at a position that either the splitting should be
performed in front finger, or in rear finger. Splitting the nodes in finger results a tuple,
that the first part and the third part are lists before and after the splitting point, while
the second part is a node contains the element at the original position to be split. As
both fingers hold at most 3 nodes because they are actually 2-3 trees, the nodes splitting
algorithm can be performed by a linear search.
function Split-Nodes(L, i)
for j ∈ [1, Length(L) ] do
if i < Size(L[j]) then
return (L[1...j − 1], L[j], L[j + 1... Length(L) ])
i ← i− Size(L[j])
We next create two new result trees T1′ and T2′ from this tuple, and connected them
as the final middle part inner tree of T1 and T2 .
Next we need perform a bottom-up traverse along with the result trees to fill out all
the empty information we skipped in the first pass.
We loop on the second part of the tuple, the node, till it becomes a leaf. In each
iteration, we repeatedly splitting the children of the node with an updated position i.
314 CHAPTER 12. SEQUENCES, THE LAST BRICK

The first list of nodes returned from splitting is used to fill the rear finger of T1 ; and the
other list of nodes is used to fill the front finger of T2 . After that, since all the three parts
of a finger tree – the front and rear finger, and the middle part inner tree – are filled, we
can then calculate the size of the tree by summing these three parts up.
function Sum-Sizes(T )
return Size-Nodes(Front(T )) + Size-Tr(Mid(T )) + Size-Nodes(Rear(T ))
Next, the iteration goes on along with the parent fields of T1 and T2 . The last ’black-
box’ algorithm is From-Nodes(L), which can create a finger tree from a list of nodes. It
can be easily realized by repeatedly perform insertion on an empty tree. The implemen-
tation is left as an exercise to the reader.
The example Python code for splitting is given as below.
def splitAt(t, i):
(t1, t2) = (Tree(), Tree())
while szf(t) ≤ i and i < szf(t) + szm(t):
fst = Tree(0, [Link], None, [])
snd = Tree(0, [], None, [Link])
t1.set_mid(fst)
t2.set_mid(snd)
(t1, t2) = (fst, snd)
i = i - szf(t)
t = [Link]

if i < szf(t):
(xs, n, ys) = splitNs([Link], i)
sz = [Link] - sizeNs(xs) - [Link]
(fst, snd) = (fromNodes(xs), Tree(sz, ys, [Link], [Link]))
elif szf(t) + szm(t) ≤ i:
(xs, n, ys) = splitNs([Link], i - szf(t) - szm(t))
sz = [Link] - sizeNs(ys) - [Link]
(fst, snd) = (Tree(sz, [Link], [Link], xs), fromNodes(ys))
t1.set_mid(fst)
t2.set_mid(snd)

i = i - sizeT(fst)
while not [Link]:
(xs, n, ys) = splitNs([Link], i)
i = i - sizeNs(xs)
([Link], [Link]) = (xs, ys)
[Link] = sizeNs([Link]) + sizeT([Link]) + sizeNs([Link])
[Link] = sizeNs([Link]) + sizeT([Link]) + sizeNs([Link])
(t1, t2) = ([Link], [Link])

return (flat(t1), elem(n), flat(t2))

The program to split a list of nodes at a given position is listed like this.
def splitNs(ns, i):
for j in range(len(ns)):
if i < ns[j].size:
return (ns[:j], ns[j], ns[j+1:])
i = i - ns[j].size

With splitting defined, removing an element at arbitrary position can be realized

trivially by first performing a splitting, then concatenating the two result tree to one big
tree and return the element at that position.
function Remove-At(T, i)
(T1 , x, T2 ) ← Split-At(T, i)
return (x, Concat(T1 , T2 ) )

Exercise 12.6
12.7. NOTES AND SHORT SUMMARY 315

1. Another way to realize insertT ′ is to force increasing the size field by one, so that
we needn’t write function tree′ . Try to realize the algorithm by using this idea.

2. Try to handle the augment size information as well as in insertT ′ algorithm for
the following algorithms (both functional and imperative): extractT ′ , appendT ′ ,
removeT ′ , and concat′ . The head, tail, init and last functions should be kept
unchanged. Don’t refer to the download-able programs along with this book before
you take a try.

3. In the imperative Apply-At algorithm, it tests if the size of the current tree is
greater than one. Why don’t we test if the current tree is a leaf? Tell the difference
between these two approaches.
4. Implement the From-Nodes(L) in your favorite imperative programming language.
You can either use looping or create a folding-from-right sub algorithm.

12.7 Notes and short summary

Although we haven’t been able to give a purely functional realization to match the O(1)
constant time random access as arrays in imperative settings. The result finger tree data
structure achieves an overall well performed sequence. It manipulates fast in amortized
O(1) time both on head an on tail, it can also concatenates two sequence in logarithmic
time as well as break one sequence into two sub sequences at any position. While neither
arrays in imperative settings nor linked-list in functional settings satisfies all these goals.
Some functional programming languages adopt this sequence realization in its standard
library [67].
Just as the title of this chapter, we’ve presented the last corner stone of elementary
data structures in both functional and imperative settings. We needn’t concern about be-
ing lack of elementary data structures when solve problems with some typical algorithms.
For example, when writing a MTF (move-to-front) encoding algorithm[68], with the
help of the sequence data structure explained in this chapter. We can implement it quite
straightforward.

mtf (S, i) = {x} ∪ S ′

where (x, S ′ ) = removeAt(S, i).

In the next following chapters, we’ll first explains some typical divide and conquer
sorting methods, including quick sort, merge sort and their variants; then some elementary
searching algorithms, and string matching algorithms will be covered.
316 CHAPTER 12. SEQUENCES, THE LAST BRICK
Bibliography

[1] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[2] Chris Okasaki. “Purely Functional Random-Access Lists”. Functional Programming
Languages and Computer Architecture, June 1995, pages 86-95.
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[4] Miran Lipovaca. “Learn You a Haskell for Great Good! A Beginner’s Guide”. No
Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327-283-8
[5] Ralf Hinze and Ross Paterson. “Finger Trees: A Simple General-purpose Data
Structure.” in Journal of Functional Programming16:2 (2006), pages 197-217.
[Link] ross/papers/[Link]
[6] Guibas, L. J., McCreight, E. M., Plass, M. F., Roberts, J. R. (1977), ”A new repre-
sentation for linear lists”. Conference Record of the Ninth Annual ACM Symposium
on Theory of Computing, pp. 49¨C60.
[7] Generic finger-tree structure. [Link]
[Link]
[8] Wikipedia. Move-to-front transform. [Link]
front_transform

317
318 Quick sort vs. Merge sort
Chapter 13

Divide and conquer, Quick sort

vs. Merge sort

13.1 Introduction
It’s proved that the best approximate performance of comparison based sorting is O(n lg n)
[51]. In this chapter, two divide and conquer sorting algorithms are introduced. Both
of them perform in O(n lg n) time. One is quick sort. It is the most popular sorting
algorithm. Quick sort has been well studied, many programming libraries provide sorting
tools based on quick sort.
In this chapter, we’ll first introduce the idea of quick sort, which demonstrates the
power of divide and conquer strategy well. Several variants will be explained, and we’ll
see when quick sort performs poor in some special cases. That the algorithm is not able
to partition the sequence in balance.
In order to solve the unbalanced partition problem, we’ll next introduce about merge
sort, which ensure the sequence to be well partitioned in all the cases. Some variants of
merge sort, including nature merge sort, bottom-up merge sort are shown as well.
Same as other chapters, all the algorithm will be realized in both imperative and
functional approaches.

13.2 Quick sort

Consider a teacher arranges a group of kids in kindergarten to stand in a line for some
game. The kids need stand in order of their heights, that the shortest one stands on the
left most, while the tallest stands on the right most. How can the teacher instruct these
kids, so that they can stand in a line by themselves?
There are many strategies, and the quick sort approach can be applied here:
1. The first kid raises his/her hand. The kids who are shorter than him/her stands
to the left to this child; the kids who are taller than him/her stands to the right of
this child;
2. All the kids move to the left, if there are, repeat the above step; all the kids move
to the right repeat the same step as well.
Suppose a group of kids with their heights as {102, 100, 98, 95, 96, 99, 101, 97} with
[cm] as the unit. The following table illustrate how they stand in order of height by
following this method.

319
320 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

Figure 13.1: Instruct kids to stand in a line

102 100 98 95 96 99 101 97

100 98 95 96 99 101 97 102
98 95 96 99 97 100 101 102
95 96 97 98 99 100 101 102
95 96 97 98 99 100 101 102
95 96 97 98 99 100 101 102
95 96 97 98 99 100 101 102
At the beginning, the first child with height 102 cm raises his/her hand. We call this
kid the pivot and mark this height in bold. It happens that this is the tallest kid. So all
others stands to the left side, which is represented in the second row in the above table.
Note that the child with height 102 cm is in the final ordered position, thus we mark it
italic. Next the kid with height 100 cm raise hand, so the children of heights 98, 95, 96
and 99 cm stand to his/her left, and there is only 1 child of height 101 cm who is taller
than this pivot kid. So he stands to the right hand. The 3rd row in the table shows this
stage accordingly. After that, the child of 98 cm high is selected as pivot on left hand;
while the child of 101 cm high on the right is selected as pivot. Since there are no other
kids in the unsorted group with 101 cm as pivot, this small group is ordered already and
the kid of height 101 cm is in the final proper position. The same method is applied to
the group of kids which haven’t been in correct order until all of them are stands in the
final position.

13.2.1 Basic version

Summarize the above instruction leads to the recursive description of quick sort. In order
to sort a sequence of elements L.

• If L is empty, the result is obviously empty; This is the trivial edge case;

• Otherwise, select an arbitrary element in L as a pivot, recursively sort all elements

not greater than the pivot, put the result on the left hand of the pivot, and recur-
sively sort all elements which are greater than the pivot, put the result on the right
hand of the pivot.

Note that the emphasized word and, we don’t use ‘then’ here, which indicates it’s quite
OK that the recursive sort on the left and right can be done in parallel. We’ll return this
parallelism topic soon.
Quick sort was first developed by C. A. R. Hoare in 1960 [51], [78]. What we describe
here is a basic version. Note that it doesn’t state how to select the pivot. We’ll see soon
that the pivot selection affects the performance of quick sort dramatically.
13.2. QUICK SORT 321

The most simple method to select the pivot is always choose the first one so that quick
sort can be formalized as the following.
{
ϕ : L=ϕ
sort(L) =
sort({x|x ∈ L′ , x ≤ l1 }) ∪ {l1 } ∪ sort({x|x ∈ L′ , l1 < x}) : otherwise
(13.1)
Where l1 is the first element of the non-empty list L, and L′ contains the rest elements
{l2 , l3 , ...}. Note that we use Zermelo Frankel expression (ZF expression for short)1 , which
is also known as list comprehension. A ZF expression {a|a ∈ S, p1 (a), p2 (a), ...} means
taking all element in set S, if it satisfies all the predication p1 , p2 , .... ZF expression is
originally used for representing set, we extend it to express list for the sake of brevity.
There can be duplicated elements, and different permutations represent for different list.
Please refer to the appendix about list in this book for detail.
It’s quite straightforward to translate this equation to real code if list comprehension
is supported. The following Haskell code is given for example:
sort [] = []
sort (x:xs) = sort [y | y←xs, y ≤ x] ++ [x] ++ sort [y | y←xs, x < y]

This might be the shortest quick sort program in the world at the time when this
book is written. Even a verbose version is still very expressive:
sort [] = []
sort (x:xs) = as ++ [x] ++ bs where
as = sort [ a | a ← xs, a ≤ x]
bs = sort [ b | b ← xs, x < b]

There are some variants of this basic quick sort program, such as using explicit filter-
ing instead of list comprehension. The following Python program demonstrates this for
example:
def sort(xs):
if xs == []:
return []
pivot = xs[0]
as = sort(filter(lambda x : x ≤ pivot, xs[1:]))
bs = sort(filter(lambda x : pivot < x, xs[1:]))
return as + [pivot] + bs

13.2.2 Strict weak ordering

We assume the elements are sorted in monotonic none decreasing order so far. It’s quite
possible to customize the algorithm, so that it can sort the elements in other ordering
criteria. This is necessary in practice because users may sort numbers, strings, or other
complex objects (even list of lists for example).
The typical generic solution is to abstract the comparison as a parameter as we men-
tioned in chapters about insertion sort and selection sort. Although it needn’t the total
ordering, the comparison must satisfy strict weak ordering at least [79] [52].
For the sake of brevity, we only considering sort the elements by using less than or
equal (equivalent to not greater than) in the rest of the chapter.

13.2.3 Partition
Observing that the basic version actually takes two passes to find all elements which are
greater than the pivot as well as to find the others which are not respectively. Such
1 Name for the two mathematics who found the modern set theory.
322 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

partition can be accomplished by only one pass. We explicitly define the partition as
below.

 (ϕ, ϕ) : L=ϕ
partition(p, L) = ({l1 } ∪ A, B) : p(l1 ), (A, B) = partition(p, L′ ) (13.2)

(A, {l1 } ∪ B) : ¬p(l1 )

Note that the operation {x} ∪ L is just a ‘cons’ operation, which only takes constant
time. The quick sort can be modified accordingly.
{
ϕ : L=ϕ
sort(L) =
sort(A) ∪ {l1 } ∪ sort(B) : otherwise, (A, B) = partition(λx x ≤ l1 , L′ )
(13.3)
Translating this new algorithm into Haskell yields the below code.
sort [] = []
sort (x:xs) = sort as ++ [x] ++ sort bs where
(as, bs) = partition ( ≤ x) xs

partition _ [] = ([], [])

partition p (x:xs) = let (as, bs) = partition p xs in
if p x then (x:as, bs) else (as, x:bs)

The concept of partition is very critical to quick sort. Partition is also very important
to many other sort algorithms. We’ll explain how it generally affects the sorting method-
ology by the end of this chapter. Before further discussion about fine tuning of quick sort
specific partition, let’s see how to realize it in-place imperatively.
There are many partition methods. The one given by Nico Lomuto [4] [4] will be used
here as it’s easy to understand. We’ll show other partition algorithms soon and see how
partitioning affects the performance.

pivot left right

x[l] ...not greater than ... ... greater than ... ...?...x[u]

(a) Partition invariant

pivot left right

x[l] x[l+1] ...?...x[u]

(b) Start
pivot left right

x[l] ...not greater than ... x[left] ... greater than ...x[u]

swap

Figure 13.2: Partition a range of array by using the left most element as pivot.
13.2. QUICK SORT 323

Figure 13.2 shows the idea of this one-pass partition method. The array is processed
from left to right. At any time, the array consists of the following parts as shown in figure
13.2 (a):

• The left most cell contains the pivot; By the end of the partition process, the pivot
will be moved to the final proper position;

• A segment contains all elements which are not greater than the pivot. The right
boundary of this segment is marked as ‘left’;

• A segment contains all elements which are greater than the pivot. The right bound-
ary of this segment is marked as ‘right’; It means that elements between ‘left’ and
‘right’ marks are greater than the pivot;

• The rest of elements after ‘right’ mark haven’t been processed yet. They may be
greater than the pivot or not.

At the beginning of partition, the ‘left’ mark points to the pivot and the ‘right’ mark
points to the the second element next to the pivot in the array as in Figure 13.2 (b); Then
the algorithm repeatedly advances the right mark one element after the other till passes
the end of the array.
In every iteration, the element pointed by the ‘right’ mark is compared with the pivot.
If it is greater than the pivot, it should be among the segment between the ‘left’ and ‘right’
marks, so that the algorithm goes on to advance the ‘right’ mark and examine the next
element; Otherwise, since the element pointed by ‘right’ mark is less than or equal to the
pivot (not greater than), it should be put before the ‘left’ mark. In order to achieve this,
the ‘left’ mark needs be advanced by one, then exchange the elements pointed by the ‘left’
and ‘right’ marks.
Once the ‘right’ mark passes the last element, it means that all the elements have
been processed. The elements which are greater than the pivot have been moved to the
right hand of ‘left’ mark while the others are to the left hand of this mark. Note that the
pivot should move between the two segments. An extra exchanging between the pivot
and the element pointed by ‘left’ mark makes this final one to the correct location. This
is shown by the swap bi-directional arrow in figure 13.2 (c).
The ‘left’ mark (which points the pivot finally) partitions the whole array into two
parts, it is returned as the result. We typically increase the ‘left’ mark by one, so that it
points to the first element greater than the pivot for convenient. Note that the array is
modified in-place.
The partition algorithm can be described as the following. It takes three arguments,
the array A, the lower and the upper bound to be partitioned 2 .
1: function Partition(A, l, u)
2: p ← A[l] ▷ the pivot
3: L←l ▷ the left mark
4: for R ∈ [l + 1, u] do ▷ iterate on the right mark
5: if ¬(p < A[R]) then ▷ negate of < is enough for strict weak order
6: L←L+1
7: Exchange A[L] ↔ A[R]
8: Exchange A[L] ↔ p
9: return L + 1 ▷ The partition position
2 The partition algorithm used here is slightly different from the one in [4]. The latter uses the last

element in the slice as the pivot.

324 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

Below table shows the steps of partitioning the array {3, 2, 5, 4, 0, 1, 6, 7}.
(l) 3 (r) 2 5 4 0 1 6 7 initialize, pivot = 3, l = 1, r = 2
3 (l)(r) 2 5 4 0 1 6 7 2 < 3, advance l, (r = l)
3 (l) 2 (r) 5 4 0 1 6 7 5 > 3, move on
3 (l) 2 5 (r) 4 0 1 6 7 4 > 3, move on
3 (l) 2 5 4 (r) 0 1 6 7 0<3
3 2 (l) 0 4 (r) 5 1 6 7 Advance l, then swap with r
3 2 (l) 0 4 5 (r) 1 6 7 1<3
3 2 0 (l) 1 5 (r) 4 6 7 Advance l, then swap with r
3 2 0 (l) 1 5 4 (r) 6 7 6 > 3, move on
3 2 0 (l) 1 5 4 6 (r) 7 7 > 3, move on
1 2 0 3 (l+1) 5 4 6 7 r passes the end, swap pivot and
This version of partition algorithm can be implemented in ANSI C as the following.
int partition(Key∗ xs, int l, int u) {
int pivot, r;
for (pivot = l, r = l + 1; r < u; ++r)
if (!(xs[pivot] < xs[r])) {
++l;
swap(xs[l], xs[r]);
}
swap(xs[pivot], xs[l]);
return l + 1;
}

Where swap(a, b) can either be defined as function or a macro. In ISO C++,

swap(a, b) is provided as a function template. the type of the elements can be defined
somewhere or abstracted as a template parameter in ISO C++. We omit these language
specific details here.
With this partition method realized, the imperative in-place quick sort can be accom-
plished as the following.
1: procedure Quick-Sort(A, l, u)
2: if l < u then
3: m ← Partition(A, l, u)
4: Quick-Sort(A, l, m − 1)
5: Quick-Sort(A, m, u)
When sort an array, this procedure is called by passing the whole range as the lower
and upper bounds. Quick-Sort(A, 1, |A|). Note that when l ≥ u it means the array
slice is either empty, or just contains only one element, both can be treated as ordered,
so the algorithm does nothing in such cases.
Below ANSI C example program completes the basic in-place quick sort.
void quicksort(Key∗ xs, int l, int u) {
int m;
if (l < u) {
m = partition(xs, l, u);
quicksort(xs, l, m - 1);
quicksort(xs, m, u);
}
}

13.2.4 Minor improvement in functional partition

Before exploring how to improve the partition for basic version quick sort, it’s obviously
that the one presented so far can be defined by using folding. Please refer to the appendix
13.2. QUICK SORT 325

A of this book for definition of folding.

partition(p, L) = f old(f (p), (ϕ, ϕ), L) (13.4)

Where function f compares the element to the pivot with predicate p (which is passed
to f as a parameter, so that f is in curried form, see appendix A for detail. Alternatively,
f can be a lexical closure which is in the scope of partition, so that it can access the
predicate in this scope.), and update the result pair accordingly.
{
({x} ∪ A, B) : p(x)
f (p, x, (A, B)) = (13.5)
(A, {x} ∪ B) : otherwise(¬p(x))

Note we actually use pattern-matching style definition. In environment without

pattern-matching support, the pair (A, B) should be represented by a variable, for exam-
ple P , and use access functions to extract its first and second parts.
The example Haskell program needs to be modified accordingly.
sort [] = []
sort (x:xs) = sort small ++ [x] ++ sort big where
(small, big) = foldr f ([], []) xs
f a (as, bs) = if a ≤ x then (a:as, bs) else (as, a:bs)

Accumulated partition
The partition algorithm by using folding actually accumulates to the result pair of lists
(A, B). That if the element is not greater than the pivot, it’s accumulated to A, otherwise
to B. We can explicitly express it which save spaces and is friendly for tail-recursive call
optimization (refer to the appendix A of this book for detail).

 (A, B) : L = ϕ
partition(p, L, A, B) = partition(p, L′ , {l1 } ∪ A, B) : p(l1 ) (13.6)

partition(p, L′ , A, {l1 } ∪ B) : otherwise

Where l1 is the first element in L if L isn’t empty, and L′ contains the rest elements
except for l1 , that L′ = {l2 , l3 , ...} for example. The quick sort algorithm then uses this
accumulated partition function by passing the λx x ≤ pivot as the partition predicate.
{
ϕ : L=ϕ
sort(L) = (13.7)
sort(A) ∪ {l1 } ∪ sort(B) : otherwise

Where A, B are computed by the accumulated partition function defined above.

(A, B) = partition(λx x ≤ l1 , L′ , ϕ, ϕ)

Accumulated quick sort

Observe the recursive case in the last quick sort definition. the list concatenation op-
erations sort(A) ∪ {l1 } ∪ sort(B) actually are proportion to the length of the list to be
concatenated. Of course we can use some general solutions introduced in the appendix A
of this book to improve it. Another way is to change the sort algorithm to accumulated
manner. Something like below:
{
′ S : L=ϕ
sort (L, S) =
... : otherwise
Where S is the accumulator, and we call this version by passing empty list as the
accumulator to start sorting: sort(L) = sort′ (L, ϕ). The key intuitive is that after the
326 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

partition finishes, the two sub lists need to be recursively sorted. We can first recursively
sort the list contains the elements which are greater than the pivot, then link the pivot
in front of it and use it as an accumulator for next step sorting.
Based on this idea, the ’...’ part in above definition can be realized as the following.
{
′ S : L=ϕ
sort (L, S) =
sort(A, {l1 } ∪ sort(B, ?)) : otherwise

The problem is what’s the accumulator when sorting B. There is an important invari-
ant actually, that at every time, the accumulator S holds the elements have been sorted
so far. So that we should sort B by accumulating to S.
{
S : L=ϕ
sort′ (L, S) = (13.8)
sort(A, {l1 } ∪ sort(B, S)) : otherwise

The following Haskell example program implements the accumulated quick sort algo-
rithm.
asort xs = asort' xs []

asort' [] acc = acc

asort' (x:xs) acc = asort' as (x:asort' bs acc) where
(as, bs) = part xs [] []
part [] as bs = (as, bs)
part (y:ys) as bs | y ≤ x = part ys (y:as) bs
| otherwise = part ys as (y:bs)

Exercise 13.1

• Implement the recursive basic quick sort algorithm in your favorite imperative pro-
gramming language.

• Same as the imperative algorithm, one minor improvement is that besides the empty
case, we needn’t sort the singleton list, implement this idea in the functional algo-
rithm as well.

• The accumulated quick sort algorithm developed in this section uses intermediate
variable A, B. They can be eliminated by defining the partition function to mutually
recursive call the sort function. Implement this idea in your favorite functional
programming language. Please don’t refer to the downloadable example program
along with this book before you try it.

13.3 Performance analysis for quick sort

Quick sort performs well in practice, however, it’s not easy to give theoretical analysis.
It needs the tool of probability to prove the average case performance.
Nevertheless, it’s intuitive to calculate the best case and worst case performance. It’s
obviously that the best case happens when every partition divides the sequence into two
slices with equal size. Thus it takes O(lg n) recursive calls as shown in figure 13.3.
There are total O(lg n) levels of recursion. In the first level, it executes one partition,
which processes n elements; In the second level, it executes partition two times, each
processes n/2 elements, so the total time in the second level bounds to 2O(n/2) = O(n)
as well. In the third level, it executes partition four times, each processes n/4 elements.
13.3. PERFORMANCE ANALYSIS FOR QUICK SORT 327

n/2 n/2

n /4 n /4 n /4 n /4

...lg(n)...

1 1 ...n... 1

Figure 13.3: In the best case, quick sort divides the sequence into two slices with same
length.

The total time in the third level is also bound to O(n); ... In the last level, there are n
small slices each contains a single element, the time is bound to O(n). Summing all the
time in each level gives the total performance of quick sort in best case as O(n lg n).
However, in the worst case, the partition process unluckily divides the sequence to
two slices with unbalanced lengths in most time. That one slices with length O(1), the
other is O(n). Thus the recursive time degrades to O(n). If we draw a similar figure,
unlike in the best case, which forms a balanced binary tree, the worst case degrades into
a very unbalanced tree that every node has only one child, while the other is empty. The
binary tree turns to be a linked list with O(n) length. And in every level, all the elements
are processed, so the total performance in worst case is O(n2 ), which is as same poor as
insertion sort and selection sort.
Let’s consider when the worst case will happen. One special case is that all the
elements (or most of the elements) are same. Nico Lomuto’s partition method deals with
such sequence poor. We’ll see how to solve this problem by introducing other partition
algorithm in the next section.
The other two obvious cases which lead to worst case happen when the sequence
has already in ascending or descending order. Partition the ascending sequence makes an
empty sub list before the pivot, while the list after the pivot contains all the rest elements.
Partition the descending sequence gives an opponent result.
There are other cases which lead quick sort performs poor. There is no completely
satisfied solution which can avoid the worst case. We’ll see some engineering practice in
next section which can make it very seldom to meet the worst case.

13.3.1 Average case analysis ⋆

In average case, quick sort performs well. There is a vivid example that even the partition
divides the list every time to two lists with length 1 to 9. The performance is still bound
to O(n lg n) as shown in [4].
This subsection need some mathematic background, reader can safely skip to next
part.
There are two methods to proof the average case performance, one uses an important
328 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

fact that the performance is proportion to the total comparing operations during quick
sort [4]. Different with the selections sort that every two elements have been compared.
Quick sort avoid many unnecessary comparisons. For example suppose a partition oper-
ation on list {a1 , a2 , a3 , ..., an }. Select a1 as the pivot, the partition builds two sub lists
A = {x1 , x2 , ..., xk } and B = {y1 , y2 , ..., yn−k−1 }. In the rest time of quick sort, The
element in A will never be compared with any elements in B.
Denote the final sorted result as {a1 , a2 , ..., an }, this indicates that if element ai < aj ,
they will not be compared any longer if and only if some element ak where ai < ak < aj
has ever been selected as pivot before ai or aj being selected as the pivot.
That is to say, the only chance that ai and aj being compared is either ai is chosen as
pivot or aj is chosen as pivot before any other elements in ordered range ai+1 < ai+2 <
... < aj−1 are selected.
Let P (i, j) represent the probability that ai and aj being compared. We have:

2
P (i, j) = (13.9)
j−i+1

Since the total number of compare operation can be given as:

∑
n−1 ∑
n
C(n) = P (i, j) (13.10)
i=1 j=i+1

Note the fact that if we compared ai and aj , we won’t compare aj and ai again in the
quick sort algorithm, and we never compare ai onto itself. That’s why we set the upper
bound of i to n − 1; and lower bound of j to i + 1.
Substitute the probability, it yields:

∑
n−1 ∑
n
2
C(n) =
i=1 j=i+1
j−i+1
(13.11)
∑∑
n−1 n−i
2
=
i=1 k=1
k+1

Using the harmonic series [80]

1 1
Hn = 1 + + + .... = ln n + γ + ϵn
2 3

∑
n−1
C(n) = O(lg n) = O(n lg n) (13.12)
i=1

The other method to prove the average performance is to use the recursive fact that
when sorting list of length n, the partition splits the list into two sub lists with length i
and n − i − 1. The partition process itself takes cn time because it examine every element
with the pivot. So we have the following equation.

T (n) = T (i) + T (n − i − 1) + cn (13.13)

Where T (n) is the total time when perform quick sort on list of length n. Since i is
13.3. PERFORMANCE ANALYSIS FOR QUICK SORT 329

equally like to be any of 0, 1, ..., n − 1, taking math expectation to the equation gives:

T (n) = E(T (i)) + E(T (n − i − 1)) + cn

1∑ 1∑
n−1 n−1
= T (i) + T (n − i − 1) + cn
n i=0 n i=0
1∑ 1∑
n−1 n−1
(13.14)
= T (i) + T (j) + cn
n i=0 n j=0
2∑
b−1
= T (i) + cn
n i=0
Multiply by n to both sides, the equation changes to:
∑
n−1
nT (n) = 2 T (i) + cn2 (13.15)
i=0

Substitute n to n − 1 gives another equation:

∑
n−2
(n − 1)T (n − 1) = 2 T (i) + c(n − 1)2 (13.16)
i=0

Subtract equation (13.15) and (13.16) can eliminate all the T (i) for 0 ≤ i < n − 1.
nT (n) = (n + 1)T (n − 1) + 2cn − c (13.17)
As we can drop the constant time c in computing performance. The equation can be
one more step changed like below.
T (n) T (n − 1) 2c
= + (13.18)
n+1 n n+1
Next we assign n to n − 1, n − 2, ..., which gives us n − 1 equations.
T (n − 1) T (n − 2) 2c
= +
n n−1 n
T (n − 2) T (n − 3) 2c
= +
n−1 n−2 n−1
...
T (2) T (1) 2c
= +
3 2 3
Sum all them up, and eliminate the same components in both sides, we can deduce to
a function of n.
T (n) T (1) ∑1
n+1
= + 2c (13.19)
n+1 2 k
k=3

Using the harmonic series mentioned above, the final result is:
T (n) T (1)
O( ) = O( + 2c ln n + γ + ϵn ) = O(lg n) (13.20)
n+1 2
Thus
O(T (n)) = O(n lg n) (13.21)

Exercise 13.2
• Why Lomuto’s methods performs poor when there are many duplicated elements?
330 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

13.4 Engineering Improvement

Quick sort performs well in most cases as mentioned in previous section. However, there
does exist the worst cases which downgrade the performance to quadratic. If the data is
randomly prepared, such case is rare, however, there are some particular sequences which
lead to the worst case and these kinds of sequences are very common in practice.
In this section, some engineering practices are introduces which either help to avoid
poor performance in handling some special input data with improved partition algorithm,
or try to uniform the possibilities among cases.

13.4.1 Engineering solution to duplicated elements

As presented in the exercise of above section, N. Lomuto’s partition method isn’t good
at handling sequence with many duplicated elements. Consider a sequence with n equal
elements like: {x, x, ..., x}. There are actually two methods to sort it.

1. The normal basic quick sort: That we select an arbitrary element, which is x as the
pivot, partition it to two sub sequences, one is {x, x, ..., x}, which contains n − 1
elements, the other is empty. then recursively sort the first one; this is obviously
quadratic O(n2 ) solution.

2. The other way is to only pick those elements strictly smaller than x, and strictly
greater than x. Such partition results two empty sub sequences, and n elements
equal to the pivot. Next we recursively sort the sub sequences contains the smaller
and the bigger elements, since both of them are empty, the recursive call returns
immediately; The only thing left is to concatenate the sort results in front of and
after the list of elements which are equal to the pivot.

The latter one performs in O(n) time if all elements are equal. This indicates an
important improvement for partition. That instead of binary partition (split to two sub
lists and a pivot), ternary partition (split to three sub lists) handles duplicated elements
better.
We can define the ternary quick sort as the following.
{
ϕ : L=ϕ
sort(L) = (13.22)
sort(S) ∪ sort(E) ∪ sort(G) : otherwise

Where S, E, G are sub lists contains all elements which are less than, equal to, and
greater than the pivot respectively.

S = {x|x ∈ L, x < l1 }
E = {x|x ∈ L, x = l1 }
G = {x|x ∈ L, l1 < x}

The basic ternary quick sort can be implemented in Haskell as the following example
code.
sort [] = []
sort (x:xs) = sort [a | a←xs, a<x] ++
x:[b | b←xs, b==x] ++ sort [c | c←xs, c>x]

Note that the comparison between elements must support abstract ‘less-than’ and
‘equal-to’ operations. The basic version of ternary sort takes linear O(n) time to con-
catenate the three sub lists. It can be improved by using the standard techniques of
accumulator.
13.4. ENGINEERING IMPROVEMENT 331

Suppose function sort′ (L, A) is the accumulated ternary quick sort definition, that
L is the sequence to be sorted, and the accumulator A contains the intermediate sorted
result so far. We initialize the sorting with an empty accumulator: sort(L) = sort′ (L, ϕ).
It’s easy to give the trivial edge cases like below.
{
′ A : L=ϕ
sort (L, A) =
... : otherwise

For the recursive case, as the ternary partition splits to three sub lists S, E, G, only S
and G need recursive sort, E contains all elements equal to the pivot, which is in correct
order thus needn’t to be sorted any more. The idea is to sort G with accumulator A, then
concatenate it behind E, then use this result as the new accumulator, and start to sort
S:
{
A : L=ϕ
sort′ (L, A) = (13.23)
sort(S, E ∪ sort(G, A)) : otherwise

The partition can also be realized with accumulators. It is similar to what has been
developed for the basic version of quick sort. Note that we can’t just pass only one
predication for pivot comparison. It actually needs two, one for less-than, the other for
equality testing. For the sake of brevity, we pass the pivot element instead.


 (S, E, G) : L = ϕ

partition(p, L′ , {l1 } ∪ S, E, G) : l1 < p
partition(p, L, S, E, G) = (13.24)
 partition(p, L′ , S, {l1 } ∪ E, G) : l1 = p


partition(p, L′ , S, E, {l1 } ∪ G) : p < l1

Where l1 is the first element in L if L isn’t empty, and L′ contains all rest elements
except for l1 . Below Haskell program implements this algorithm. It starts the recursive
sorting immediately in the edge case of parition.
sort xs = sort' xs []

sort' [] r = r
sort' (x:xs) r = part xs [] [x] [] r where
part [] as bs cs r = sort' as (bs ++ sort' cs r)
part (x':xs') as bs cs r | x' < x = part xs' (x':as) bs cs r
| x' == x = part xs' as (x':bs) cs r
| x' > x = part xs' as bs (x':cs) r

Richard Bird developed another version in [1], that instead of concatenating the re-
cursively sorted results, it uses a list of sorted sub lists, and performs concatenation
finally.
sort xs = concat $ pass xs []

pass [] xss = xss

pass (x:xs) xss = step xs [] [x] [] xss where
step [] as bs cs xss = pass as (bs:pass cs xss)
step (x':xs') as bs cs xss | x' < x = step xs' (x':as) bs cs xss
| x' == x = step xs' as (x':bs) cs xss
| x' > x = step xs' as bs (x':cs) xss

2-way partition
The cases with many duplicated elements can also be handled imperatively. Robert
Sedgewick presented a partition method [69], [4] which holds two pointers. One moves
332 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

from left to right, the other moves from right to left. The two pointers are initialized as
the left and right boundaries of the array.
When start partition, the left most element is selected as the pivot. Then the left
pointer i keeps advancing to right until it meets any element which is not less than the
pivot; On the other hand3 , The right pointer j repeatedly scans to left until it meets any
element which is not greater than the pivot.
At this time, all elements before the left pointer i are strictly less than the pivot, while
all elements after the right pointer j are greater than the pivot. i points to an element
which is either greater than or equal to the pivot; while j points to an element which is
either less than or equal to the pivot, the situation at this stage is illustrated in figure
13.4 (a).
In order to partition all elements less than or equal to the pivot to the left, and the
others to the right, we can exchange the two elements pointed by i, and j. After that the
scan can be resumed. We repeat this process until either i meets j, or they overlap.
At any time point during partition. There is invariant that all elements before i
(including the one pointed by i) are not greater than the pivot; while all elements after j
(including the one pointed by j) are not less than the pivot. The elements between i and
j haven’t been examined yet. This invariant is shown in figure 13.4 (b).

pivot >=pivot <=pivot

x[l] ... less than ... x[i] ...?... x[j] ... greater than ...

(a) When pointer i, and j stop

pivot i j

x[l] ... not greater than ... ...?... ... not less than ...

(b) Partition invariant

Figure 13.4: Partition a range of array by using the left most element as the pivot.

After the left pointer i meets the right pointer j, or they overlap each other, we need
one extra exchanging to move the pivot located at the first position to the correct place
which is pointed by j. Next, the elements between the lower bound and j as well as the
sub slice between i and the upper bound of the array are recursively sorted.
This algorithm can be described as the following.
1: procedure Sort(A, l, u) ▷ sort range [l, u)
2: if u − l > 1 then ▷ More than 1 element for non-trivial case
3: i ← l, j ← u
4: pivot ← A[l]
5: loop
6: repeat
7: i←i+1
8: until A[i] ≥ pivot ▷ Need handle error case that i ≥ u in fact.
9: repeat
10: j ←j−1
11: until A[j] ≤ pivot ▷ Need handle error case that j < l in fact.
3 We don’t use ‘then’ because it’s quite OK to perform the two scans in parallel.
13.4. ENGINEERING IMPROVEMENT 333

12: if j < i then

13: break
14: Exchange A[i] ↔ A[j]
15: Exchange A[l] ↔ A[j] ▷ Move the pivot
16: Sort(A, l, j)
17: Sort(A, i, u)
Consider the extreme case that all elements are equal, this in-place quick sort will
partition the list to two equal length sub lists although it takes n2 unnecessary swaps. As
the partition is balanced, the overall performance is O(n lg n), which avoid downgrading
to quadratic. The following ANSI C example program implements this algorithm.
void qsort(Key∗ xs, int l, int u) {
int i, j, pivot;
if (l < u - 1) {
pivot = i = l; j = u;
while (1) {
while (i < u && xs[++i] < xs[pivot]);
while (j ≥ l && xs[pivot] < xs[--j]);
if (j < i) break;
swap(xs[i], xs[j]);
}
swap(xs[pivot], xs[j]);
qsort(xs, l, j);
qsort(xs, i, u);
}
}

Comparing this algorithm with the basic version based on N. Lumoto’s partition
method, we can find that it swaps fewer elements, because it skips those have already in
proper sides of the pivot.

3-way partition
It’s obviously that, we should avoid those unnecessary swapping for the duplicated ele-
ments. What’s more, the algorithm can be developed with the idea of ternary sort (as
known as 3-way partition in some materials), that all the elements which are strictly less
than the pivot are put to the left sub slice, while those are greater than the pivot are put
to the right. The middle part holds all the elements which are equal to the pivot. With
such ternary partition, we need only recursively sort the ones which differ from the pivot.
Thus in the above extreme case, there aren’t any elements need further sorting. So the
overall performance is linear O(n).
The diﬀiculty is how to do the 3-way partition. Jon Bentley and Douglas McIlroy
developed a solution which keeps those elements equal to the pivot at the left most and
right most sides as shown in figure 13.5 (a) [70] [71].
The majority part of scan process is as same as the one developed by Robert Sedgewick,
that i and j keep advancing toward each other until they meet any element which is greater
then or equal to the pivot for i, or less than or equal to the pivot for j respectively. At
this time, if i and j don’t meet each other or overlap, they are not only exchanged, but
also examined if the elements pointed by them are identical to the pivot. Then necessary
exchanging happens between i and p, as well as j and q.
By the end of the partition process, the elements equal to the pivot need to be swapped
to the middle part from the left and right ends. The number of such extra exchanging
operations are proportion to the number of duplicated elements. It’s zero operation if
elements are unique which there is no overhead in the case. The final partition result
is shown in figure 13.5 (b). After that we only need recursively sort the ‘less-than’ and
‘greater-than’ sub slices.
334 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

pivot p i j q

x[l] ... equal ... ... less than... ...?... ... greater than ... ... equal ...

(a) Invariant of 3-way partition

i j pivot

... less than... ... equal ... ... greater than ...

(b) Swapping the equal parts to the

middle

Figure 13.5: 3-way partition.

This algorithm can be given by modifying the 2-way partition as below.

1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: i ← l, j ← u
4: p ← l, q ← u ▷ points to the boundaries for equal elements
5: pivot ← A[l]
6: loop
7: repeat
8: i←i+1
9: until A[i] ≥ pivot ▷ Skip the error handling for i ≥ u
10: repeat
11: j ←j−1
12: until A[j] ≤ pivot ▷ Skip the error handling for j < l
13: if j ≤ i then
14: break ▷ Note the difference form the above algorithm
15: Exchange A[i] ↔ A[j]
16: if A[i] = pivot then ▷ Handle the equal elements
17: p←p+1
18: Exchange A[p] ↔ A[i]
19: if A[j] = pivot then
20: q ←q−1
21: Exchange A[q] ↔ A[j]
22: if i = j ∧ A[i] = pivot then ▷ A special case
23: j ← j − 1, i ← i + 1
24: for k from l to p do ▷ Swap the equal elements to the middle part
25: Exchange A[k] ↔ A[j]
26: j ←j−1
27: for k from u − 1 down-to q do
28: Exchange A[k] ↔ A[i]
29: i←i+1
30: Sort(A, l, j + 1)
31: Sort(A, i, u)
This algorithm can be translated to the following ANSI C example program.
13.4. ENGINEERING IMPROVEMENT 335

void qsort2(Key∗ xs, int l, int u) {

int i, j, k, p, q, pivot;
if (l < u - 1) {
i = p = l; j = q = u; pivot = xs[l];
while (1) {
while (i < u && xs[++i] < pivot);
while (j ≥ l && pivot < xs[--j]);
if (j ≤ i) break;
swap(xs[i], xs[j]);
if (xs[i] == pivot) { ++p; swap(xs[p], xs[i]); }
if (xs[j] == pivot) { --q; swap(xs[q], xs[j]); }
}
if (i == j && xs[i] == pivot) { --j, ++i; }
for (k = l; k ≤ p; ++k, --j) swap(xs[k], xs[j]);
for (k = u-1; k ≥ q; --k, ++i) swap(xs[k], xs[i]);
qsort2(xs, l, j + 1);
qsort2(xs, i, u);
}
}

It can be seen that the the algorithm turns to be a bit complex when it evolves to 3-way
partition. There are some tricky edge cases should be handled with caution. Actually, we
just need a ternary partition algorithm. This remind us the N. Lumoto’s method, which
is straightforward enough to be a start point.
The idea is to change the invariant a bit. We still select the first element as the pivot,
as shown in figure 13.6, at any time, the left most section contains elements which are
strictly less than the pivot; the next section contains the elements equal to the pivot; the
right most section holds all the elements which are strictly greater than the pivot. The
boundaries of three sections are marked as i, k, and j respectively. The rest part, which
is between k and j are elements haven’t been scanned yet.
At the beginning of this algorithm, the ‘less-than’ section is empty; the ‘equal-to’
section contains only one element, which is the pivot; so that i is initialized to the lower
bound of the array, and k points to the element next to i. The ‘greater-than’ section is
also initialized as empty, thus j is set to the upper bound.

i k j

... less than... ... equal ... ...?... ... greater than ...

Figure 13.6: 3-way partition based on N. Lumoto’s method.

When the partition process starts, the element pointed by k is examined. If it’s equal
to the pivot, k just advances to the next one; If it’s greater than the pivot, we swap it with
the last element in the unknown area, so that the length of ‘greater-than’ section increases
by one. It’s boundary j moves to the left. Since we don’t know if the elements swapped
to k is still greater than the pivot, it should be examined again repeatedly. Otherwise, if
the element is less than the pivot, we can exchange it with the first one in the ‘equal-to’
section to resume the invariant. The partition algorithm stops when k meets j.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: i ← l, j ← u, k ← l + 1
4: pivot ← A[i]
5: while k < j do
6: while pivot < A[k] do
336 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

7: j ←j−1
8: Exchange A[k] ↔ A[j]
9: if A[k] < pivot then
10: Exchange A[k] ↔ A[i]
11: i←i+1
12: k ←k+1
13: Sort(A, l, i)
14: Sort(A, j, u)
Compare this one with the previous 3-way partition quick sort algorithm, it’s more
simple at the cost of more swapping operations. Below ANSI C program implements this
algorithm.
void qsort(Key∗ xs, int l, int u) {
int i, j, k; Key pivot;
if (l < u - 1) {
i = l; j = u; pivot = xs[l];
for (k = l + 1; k < j; ++k) {
while (pivot < xs[k]) { --j; swap(xs[j], xs[k]); }
if (xs[k] < pivot) { swap(xs[i], xs[k]); ++i; }
}
qsort(xs, l, i);
qsort(xs, j, u);
}
}

Exercise 13.3

• All the quick sort imperative algorithms given in this section use the first element
as the pivot, another method is to choose the last one as the pivot. Realize the
quick sort algorithms, including the basic version, Sedgewick version, and ternary
(3-way partition) version by using this approach.

13.5 Engineering solution to the worst case

Although the ternary quick sort (3-way partition) solves the issue for duplicated elements,
it can’t handle some typical worst cases. For example if many of the elements in the
sequence are ordered, no matter it’s in ascending or descending order, the partition result
will be two unbalanced sub sequences, one with few elements, the other contains all the
rest.
Consider the two extreme cases, {x1 < x2 < ... < xn } and {y1 > y2 > ... > yn }. The
partition results are shown in figure 13.7.
It’s easy to give some more worst cases, for example, {xm , xm−1 , ..., x2 , x1 , xm+1 , xm+2 , ...xn }
where {x1 < x2 < ... < xn }; Another one is {xn , x1 , xn−1 , x2 , ...}. Their partition result
trees are shown in figure 13.8.
Observing that the bad partition happens easily when blindly choose the first element
as the pivot, there is a popular work around suggested by Robert Sedgwick in [69]. Instead
of selecting the fixed position in the sequence, a small sampling helps to find a pivot which
has lower possibility to cause a bad partition. One option is to examine the first element,
the middle, and the last one, then choose the median of these three element. In the worst
case, it can ensure that there is at least one element in the shorter partitioned sub list.
Note that there is one tricky in real-world implementation. Since the index is typically
represented in limited length words, it may cause overflow when calculating the middle
13.5. ENGINEERING SOLUTION TO THE WORST CASE 337

x[1] x[2] ... x[n]

[] x[2] x[3] ... x[n]

[] x[3] x[4] ... x[n]

[] ...

[] x[n]

(a) The partition tree for {x1 < x2 < ... < xn }, There aren’t any elements less than or equal to
the pivot (the first element) in every partition.
y[1] y[2] ... y[n]

y[2] y[3] ... y[n] []

y[3] y[4] ... y[n] []

... []

y[n] []

(b) The partition tree for {y1 > y2 > ... > yn }, There aren’t any
elements greater than or equal to the pivot (the first element) in every
partition.

Figure 13.7: The two worst cases.

338 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

x[m] x[m-1] ... x[1] x[m+1] x[m+2] ... x[n]

x[m-1] x[m-2] ... x[1] x[m+1] x[m+2] ... x[n]

x[m-2] x[m-3] ... x[1] [] [] x[m+2] x[m+3] ... x[n]

... [] [] ...

x[1] [] [] x[n]

(a) Except for the first partition, all the others are unbal-
anced.
x[n] x[1] x[n-1] x[2] ...

x[1] x[n-1] x[2] x[n-2] x[2] ...

[] x[n-1] x[2] x[n-2] x[3] ...

x[2] x[n-2] x[3] x[n-3] ...

[] x[n-2] x[3] x[n-3] x[4] ...

x[3] x[n-3] x[4] x[n-4] ... []

[] ...

(b) A zig-zag partition tree.

Figure 13.8: Another two worst cases.

13.5. ENGINEERING SOLUTION TO THE WORST CASE 339

index by the naive expression (l + u) / 2. In order to avoid this issue, it can be

accessed as l + (u - l) / 2. There are two methods to find the median, one needs at
most three comparisons [70]; the other is to move the minimum value to the first location,
the maximum value to the last location, and the median value to the meddle location by
swapping. After that we can select the middle as the pivot. Below algorithm illustrated
the second idea before calling the partition procedure.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: m ← b l+u
2 c ▷ Need handle overflow error in practice
4: if A[m] < A[l] then ▷ Ensure A[l] ≤ A[m]
5: Exchange A[l] ↔ A[m]
6: if A[u − 1] < A[l] then ▷ Ensure A[l] ≤ A[u − 1]
7: Exchange A[l] ↔ A[u − 1]
8: if A[u − 1] < A[m] then ▷ Ensure A[m] ≤ A[u − 1]
9: Exchange A[m] ↔ A[u − 1]
10: Exchange A[l] ↔ A[m]
11: (i, j) ← Partition(A, l, u)
12: Sort(A, l, i)
13: Sort(A, j, u)
It’s obviously that this algorithm performs well in the 4 special worst cases given above.
The imperative implementation of median-of-three is left as exercise to the reader.
However, in purely functional settings, it’s expensive to randomly access the middle
and the last element. We can’t directly translate the imperative median selection algo-
rithm. The idea of taking a small sampling and then finding the median element as pivot
can be realized alternatively by taking the first 3. For example, in the following Haskell
program.
qsort [] = []
qsort [x] = [x]
qsort [x, y] = [min x y, max x y]
qsort (x:y:z:rest) = qsort (filter (< m) (s:rest)) ++ [m] ++ qsort (filter ( ≥ m) (l:rest)) w
xs = [x, y, z]
[s, m, l] = [minimum xs, median xs, maximum xs]

Unfortunately, none of the above 4 worst cases can be well handled by this program,
this is because the sampling is not good. We need telescope, but not microscope to
profile the whole list to be partitioned. We’ll see the functional way to solve the partition
problem later.
Except for the median-of-three, there is another popular engineering practice to get
good partition result. instead of always taking the first element or the last one as the pivot.
One alternative is to randomly select one. For example as the following modification.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: Exchange A[l] ↔ A[ Random(l, u) ]
4: (i, j) ← Partition(A, l, u)
5: Sort(A, l, i)
6: Sort(A, j, u)
The function Random(l, u) returns a random integer i between l and u, that l ≤ i < u.
The element at this position is exchanged with the first one, so that it is selected as the
pivot for the further partition. This algorithm is called random quick sort [4].
Theoretically, neither median-of-three nor random quick sort can avoid the worst case
completely. If the sequence to be sorted is randomly distributed, no matter choosing the
340 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

first one as the pivot, or the any other arbitrary one are equally in effect. Considering the
underlying data structure of the sequence is singly linked-list in functional setting, it’s
expensive to strictly apply the idea of random quick sort in purely functional approach.
Even with this bad news, the engineering improvement still makes sense in real world
programming.

13.6 Other engineering practice

There is some other engineering practice which doesn’t focus on solving the bad partition
issue. Robert Sedgewick observed that when the list to be sorted is short, the overhead
introduced by quick sort is relative expense, on the other hand, the insertion sort performs
better in such case [4], [70]. Sedgewick, Bentley and McIlroy tried different threshold, as
known as ‘Cut-Off’, that when there are lesson than ‘Cut-Off’ elements, the sort algorithm
falls back to insertion sort.
1: procedure Sort(A, l, u)
2: if u − l > Cut-Off then
3: Quick-Sort(A, l, u)
4: else
5: Insertion-Sort(A, l, u)
The implementation of this improvement is left as exercise to the reader.

Exercise 13.4

• Can you figure out more quick sort worst cases besides the four given in this section?
• Implement median-of-three method in your favorite imperative programming lan-
guage.
• Implement random quick sort in your favorite imperative programming language.
• Implement the algorithm which falls back to insertion sort when the length of list
is small in both imperative and functional approach.

13.7 Side words

It’s sometimes called ‘true quick sort’ if the implementation equipped with most of the
engineering practice we introduced, including insertion sort fall-back with cut-off, in-place
exchanging, choose the pivot by median-of-three method, 3-way-partition.
The purely functional one, which express the idea of quick sort perfect can’t take all
of them. Thus someone think the functional quick sort is essentially tree sort.
Actually, quick sort does have close relationship with tree sort. Richard Bird shows
how to derive quick sort from binary tree sort by deforestation [72].
Consider a binary search tree creation algorithm called unf old. Which turns a list of
elements into a binary search tree.
{
ϕ : L=ϕ
unf old(L) = (13.25)
tree(Tl , l1 , Tr ) : otherwise

Where
Tl = unf old({a|a ∈ L′ , a ≤ l1 })
(13.26)
Tr = unf old({a|a ∈ L′ , l1 < a})
13.8. MERGE SORT 341

The interesting point is that, this algorithm creates tree in a different way as we
introduced in the chapter of binary search tree. If the list to be unfold is empty, the
result is obviously an empty tree. This is the trivial edge case; Otherwise, the algorithm
set the first element l1 in the list as the key of the node, and recursively creates its left
and right children. Where the elements used to form the left child are those which are
less than or equal to the key in L′ , while the rest elements which are greater than the key
are used to form the right child.
Remind the algorithm which turns a binary search tree to a list by in-order traversing:
{
ϕ : T =ϕ
toList(T ) = (13.27)
toList(lef t(T )) ∪ {key(T )} ∪ toList(right(T )) : otherwise
We can define quick sort algorithm by composing these two functions.
quickSort = toList · unf old (13.28)
The binary search tree built in the first step of applying unf old is the intermediate
result. This result is consumed by toList and dropped after the second step. It’s quite
possible to eliminate this intermediate result, which leads to the basic version of quick
sort.
The elimination of the intermediate binary search tree is called deforestation. This
concept is based on Burstle-Darlington’s work [9].

13.8 Merge sort

Although quick sort performs perfectly in average cases, it can’t avoid the worst case
no matter what engineering practice is applied. Merge sort, on the other kind, ensure
the performance is bound to O(n lg n) in all the cases. It’s particularly useful in theo-
retical algorithm design and analysis. Another feature is that merge sort is friendly for
linked-space settings, which is suitable for sorting nonconsecutive stored sequences. Some
functional programming and dynamic programming environments adopt merge sort as
the standard library sorting solution, such as Haskel, Python and Java (later than Java
7).
In this section, we’ll first brief the intuitive idea of merge sort, provide a basic version.
After that, some variants of merge sort will be given including nature merge sort, and
bottom-up merge sort.

13.8.1 Basic version

Same as quick sort, the essential idea behind merge sort is also divide and conquer.
Different from quick sort, merge sort enforces the divide to be strictly balanced, that it
always splits the sequence to be sorted at the middle point. After that, it recursively sort
the sub sequences and merge the sorted two sequences to the final result. The algorithm
can be described as the following.
In order to sort a sequence L,
• Trivial edge case: If the sequence to be sorted is empty, the result is obvious empty;
• Otherwise, split the sequence at the middle position, recursively sort the two sub
sequences and merge the result.
The basic merge sort algorithm can be formalized with the following equation.
{
ϕ : L=ϕ
sort(L) =
merge(sort(L1 ), sort(L2 )) : otherwise, (L1 , L2 ) = splitAt(b |L|
2 c, L)
(13.29)
342 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

Merge
There are two ‘black-boxes’ in the above merge sort definition, one is the splitAt function,
which splits a list at a given position; the other is the merge function, which can merge
two sorted lists into one.
As presented in the appendix of this book, it’s trivial to realize splitAt in imperative
settings by using random access. However, in functional settings, it’s typically realized
as a linear algorithm:
{
(ϕ, L) : n = 0
splitAt(n, L) = (13.30)
({l1 } ∪ A, B) : otherwise, (A, B) = splitAt(n − 1, L′ )

Where l1 is the first element of L, and L′ represents the rest elements except of l1 if
L isn’t empty.
The idea of merge can be illustrated as in figure 13.9. Consider two lines of kids. The
kids have already stood in order of their heights. that the shortest one stands at the first,
then a taller one, the tallest one stands at the end of the line.

Figure 13.9: Two lines of kids pass a door.

Now let’s ask the kids to pass a door one by one, every time there can be at most one
kid pass the door. The kids must pass this door in the order of their height. The one
can’t pass the door before all the kids who are shorter than him/her.
Since the two lines of kids have already been ‘sorted’, the solution is to ask the first
two kids, one from each line, compare their height, and let the shorter kid pass the door;
Then they repeat this step until one line is empty, after that, all the rest kids can pass
the door one by one.
This idea can be formalized in the following equation.


 A : B=ϕ

B : A=ϕ
merge(A, B) = (13.31)

 {a1 } ∪ merge(A′ , B) : a1 ≤ b1

{b1 } ∪ merge(A, B ′ ) : otherwise

Where a1 and b1 are the first elements in list A and B; A′ and B ′ are the rest elements
except for the first ones respectively. The first two cases are trivial edge cases. That merge
13.8. MERGE SORT 343

one sorted list with an empty list results the same sorted list; Otherwise, if both lists are
non-empty, we take the first elements from the two lists, compare them, and use the
minimum as the first one of the result, then recursively merge the rest.
With merge defined, the basic version of merge sort can be implemented like the
following Haskell example code.
msort [] = []
msort [x] = [x]
msort xs = merge (msort as) (msort bs) where
(as, bs) = splitAt (length xs `div` 2) xs

merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) | x ≤ y = x : merge xs (y:ys)
| x > y = y : merge (x:xs) ys

Note that, the implementation differs from the algorithm definition that it treats the
singleton list as trivial edge case as well.
Merge sort can also be realized imperatively. The basic version can be developed as
the below algorithm.
1: procedure Sort(A)
2: if |A| > 1 then
3: m ← b |A|
2 c
4: X ← Copy-Array(A[1...m])
5: Y ← Copy-Array(A[m + 1...|A|])
6: Sort(X)
7: Sort(Y )
8: Merge(A, X, Y )
When the array to be sorted contains at least two elements, the non-trivial sorting
process starts. It first copy the first half to a new created array X, and the second half
to a second new array Y . Recursively sort them; and finally merge the sorted result back
to A.
This version uses the same amount of extra spaces of A. This is because the Merge
algorithm isn’t in-place at the moment. We’ll introduce the imperative in-place merge
sort in later section.
The merge process almost does the same thing as the functional definition. There is
a verbose version and a simplified version by using sentinel.
The verbose merge algorithm continuously checks the element from the two input
arrays, picks the smaller one and puts it back to the result array A, it then advances
along the arrays respectively until either one input array is exhausted. After that, the
algorithm appends the rest of the elements in the other input array to A.
1: procedure Merge(A, X, Y )
2: i ← 1, j ← 1, k ← 1
3: m ← |X|, n ← |Y |
4: while i ≤ m ∧ j ≤ n do
5: if X[i] < Y [j] then
6: A[k] ← X[i]
7: i←i+1
8: else
9: A[k] ← Y [j]
10: j ←j+1
11: k ←k+1
12: while i ≤ m do
13: A[k] ← X[i]
344 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

14: k ←k+1
15: i←i+1
16: while j ≤ n do
17: A[k] ← Y [j]
18: k ←k+1
19: j ←j+1
Although this algorithm is a bit verbose, it can be short in some programming en-
vironment with enough tools to manipulate array. The following Python program is an
example.
def msort(xs):
n = len(xs)
if n > 1:
ys = [x for x in xs[:n/2]]
zs = [x for x in xs[n/2:]]
ys = msort(ys)
zs = msort(zs)
xs = merge(xs, ys, zs)
return xs

def merge(xs, ys, zs):

i = 0
while ys ̸= [] and zs ̸= []:
xs[i] = [Link](0) if ys[0] < zs[0] else [Link](0)
i = i + 1
xs[i:] = ys if ys ̸= [] else zs
return xs

Performance
Before dive into the improvement of this basic version, let’s analyze the performance of
merge sort. The algorithm contains two steps, divide step, and merge step. In divide step,
the sequence to be sorted is always divided into two sub sequences with the same length.
If we draw a similar partition tree as what we did for quick sort, it can be found this tree
is a perfectly balanced binary tree as shown in figure 13.3. Thus the height of this tree is
O(lg n). It means the recursion depth of merge sort is bound to O(lg n). Merge happens
in every level. It’s intuitive to analyze the merge algorithm, that it compare elements
from two input sequences in pairs, after one sequence is fully examined the rest one is
copied one by one to the result, thus it’s a linear algorithm proportion to the length of
the sequence. Based on this facts, denote T (n) the time for sorting the sequence with
length n, we can write the recursive time cost as below.

n n
T (n) = T ( ) + T ( ) + cn
2 2 (13.32)
n
= 2T ( ) + cn
2

It states that the cost consists of three parts: merge sort the first half takes T ( n2 ),
merge sort the second half takes also T ( n2 ), merge the two results takes cn, where c is
some constant. Solve this equation gives the result as O(n lg n).
Note that, this performance doesn’t vary in all cases, as merge sort always uniformly
divides the input.
Another significant performance indicator is space occupation. However, it varies a
lot in different merge sort implementation. The detail space bounds analysis will be
explained in every detailed variants later.
13.8. MERGE SORT 345

For the basic imperative merge sort, observe that it demands same amount of spaces
as the input array in every recursion, copies the original elements to them for recursive
sort, and these spaces can be released after this level of recursion. So the peak space
requirement happens when the recursion enters to the deepest level, which is O(n lg n).
The functional merge sort consume much less than this amount, because the underly-
ing data structure of the sequence is linked-list. Thus it needn’t extra spaces for merge4 .
The only spaces requirement is for book-keeping the stack for recursive calls. This can
be seen in the later explanation of even-odd split algorithm.

Minor improvement
We’ll next improve the basic merge sort bit by bit for both the functional and imperative
realizations. The first observation is that the imperative merge algorithm is a bit verbose.
[4] presents an elegant simplification by using positive ∞ as the sentinel. That we append
∞ as the last element to the both ordered arrays for merging5 . Thus we needn’t test
which array is not exhausted. Figure 13.10 illustrates this idea.

a[i] ... a[n] INF

x[1] x[2] ... x[k]

b[j] ... b[m] INF

Figure 13.10: Merge with ∞ as sentinels.

1: procedure Merge(A, X, Y )
2: Append(X, ∞)
3: Append(Y, ∞)
4: i ← 1, j ← 1
5: for k ← from 1 to |A| do
6: if X[i] < Y [j] then
7: A[k] ← X[i]
8: i←i+1
9: else
10: A[k] ← Y [j]
11: j ←j+1
The following ANSI C program imlements this idea. It embeds the merge inside. INF
is defined as a big constant number with the same type of Key. Where the type can either
be defined elsewhere or we can abstract the type information by passing the comparator
as parameter. We skip these implementation and language details here.
void msort(Key∗ xs, int l, int u) {
int i, j, m;

4 The complex effects caused by lazy evaluation is ignored here, please refer to [72] for detail
5 For sorting in monotonic non-increasing order, −∞ can be used instead
346 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

Key ∗as, ∗bs;

if (u - l > 1) {
m = l + (u - l) / 2; /∗ avoid int overflow ∗/
msort(xs, l, m);
msort(xs, m, u);
as = (Key∗) malloc(sizeof(Key) ∗ (m - l + 1));
bs = (Key∗) malloc(sizeof(Key) ∗ (u - m + 1));
memcpy((void∗)as, (void∗)(xs + l), sizeof(Key) ∗ (m - l));
memcpy((void∗)bs, (void∗)(xs + m), sizeof(Key) ∗ (u - m));
as[m - l] = bs[u - m] = INF;
for (i = j = 0; l < u; ++l)
xs[l] = as[i] < bs[j] ? as[i++] : bs[j++];
free(as);
free(bs);
}
}

Running this program takes much more time than the quick sort. Besides the major
reason we’ll explain later, one problem is that this version frequently allocates and releases
memories for merging. While memory allocation is one of the well known bottle-neck in
real world as mentioned by Bentley in [4]. One solution to address this issue is to allocate
another array with the same size to the original one as the working area. The recursive
sort for the first and second halves needn’t allocate any more extra spaces, but use the
working area when merging. Finally, the algorithm copies the merged result back.
This idea can be expressed as the following modified algorithm.
1: procedure Sort(A)
2: B ← Create-Array(|A|)
3: Sort’(A, B, 1, |A|)

4: procedure Sort’(A, B, l, u)
5: if u − l > 0 then
6: 2 c
m ← b l+u
7: Sort’(A, B, l, m)
8: Sort’(A, B, m + 1, u)
9: Merge’(A, B, l, m, u)
This algorithm duplicates another array, and pass it along with the original array
to be sorted to Sort’ algorithm. In real implementation, this working area should be
released either manually, or by some automatic tool such as GC (Garbage collection).
The modified algorithm Merge’ also accepts a working area as parameter.
1: procedure Merge’(A, B, l, m, u)
2: i ← l, j ← m + 1, k ← l
3: while i ≤ m ∧ j ≤ u do
4: if A[i] < A[j] then
5: B[k] ← A[i]
6: i←i+1
7: else
8: B[k] ← A[j]
9: j ←j+1
10: k ←k+1
11: while i ≤ m do
12: B[k] ← A[i]
13: k ←k+1
14: i←i+1
15: while j ≤ u do
13.8. MERGE SORT 347

16: B[k] ← A[j]

17: k ←k+1
18: j ←j+1
19: for i ← from l to u do ▷ Copy back
20: A[i] ← B[i]
By using this minor improvement, the space requirement reduced to O(n) from O(n lg n).
The following ANSI C program implements this minor improvement. For illustration pur-
pose, we manually copy the merged result back to the original array in a loop. This can
also be realized by using standard library provided tool, such as memcpy.
void merge(Key∗ xs, Key∗ ys, int l, int m, int u) {
int i, j, k;
i = k = l; j = m;
while (i < m && j < u)
ys[k++] = xs[i] < xs[j] ? xs[i++] : xs[j++];
while (i < m)
ys[k++] = xs[i++];
while (j < u)
ys[k++] = xs[j++];
for(; l < u; ++l)
xs[l] = ys[l];
}

void msort(Key∗ xs, Key∗ ys, int l, int u) {

int m;
if (u - l > 1) {
m = l + (u - l) / 2;
msort(xs, ys, l, m);
msort(xs, ys, m, u);
merge(xs, ys, l, m, u);
}
}

void sort(Key∗ xs, int l, int u) {

Key∗ ys = (Key∗) malloc(sizeof(Key) ∗ (u - l));
kmsort(xs, ys, l, u);
free(ys);
}

This new version runs faster than the previous one. In my test machine, it speeds up
about 20% to 25% when sorting 100,000 randomly generated numbers.
The basic functional merge sort can also be fine tuned. Observe that, it splits the list
at the middle point. However, as the underlying data structure to represent list is singly
linked-list, random access at a given position is a linear operation (refer to appendix A for
detail). Alternatively, one can split the list in an even-odd manner. That all the elements
in even position are collected in one sub list, while all the odd elements are collected
in another. As for any list, there are either same amount of elements in even and odd
positions, or they differ by one. So this divide strategy always leads to well splitting, thus
the performance can be ensured to be O(n lg n) in all cases.
The even-odd splitting algorithm can be defined as below.

 (ϕ, ϕ) : L = ϕ
split(L) = ({l1 }, ϕ) : |L| = 1 (13.33)

({l1 } ∪ A, {l2 } ∪ B) : otherwise, (A, B) = split(L′′ )

When the list is empty, the split result are two empty lists; If there is only one element
in the list, we put this single element, which is at position 1, to the odd sub list, the even
sub list is empty; Otherwise, it means there are at least two elements in the list, We pick
348 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

the first one to the odd sub list, the second one to the even sub list, and recursively split
the rest elements.
All the other functions are kept same, the modified Haskell program is given as the
following.
split [] = ([], [])
split [x] = ([x], [])
split (x:y:xs) = (x:xs', y:ys') where (xs', ys') = split xs

13.9 In-place merge sort

One drawback for the imperative merge sort is that it requires extra spaces for merging,
the basic version without any optimization needs O(n lg n) in peak time, and the one by
allocating a working area needs O(n).
It’s nature for people to seek the in-place version merge sort, which can reuse the
original array without allocating any extra spaces. In this section, we’ll introduce some
solutions to realize imperative in-place merge sort.

13.9.1 Naive in-place merge

The first idea is straightforward. As illustrated in figure 13.11, sub list A, and B are
sorted, when performs in-place merge, the invariant ensures that all elements before i are
merged, so that they are in non-decreasing order; every time we compare the i-th and the
j-th elements. If the i-th is less than the j-th, the marker i just advances one step to the
next. This is the easy case. Otherwise, it means that the j-th element is the next merge
result, which should be put in front of i. In order to achieve this, all elements between i
and j, including the i-th should be shift to the end by one cell. We repeat this process
till all the elements in A and B are put to the correct positions.

shift if not xs[i] < xs[j]

merged xs[i] ...sorted sub list A... xs[j] ...sorted sub list B...

Figure 13.11: Naive in-place merge

1: procedure Merge(A, l, m, u)
2: while l ≤ m ∧ m ≤ u do
3: if A[l] < A[m] then
4: l ←l+1
5: else
6: x ← A[m]
7: for i ← m down-to l + 1 do ▷ Shift
8: A[i] ← A[i − 1]
9: A[l] ← x
However, this naive solution downgrades merge sort overall performance to quadratic
O(n2 )! This is because that array shifting is a linear operation. It is proportion to the
length of elements in the first sorted sub array which haven’t been compared so far.
The following ANSI C program based on this algorithm runs very slow, that it takes
about 12 times slower than the previous version when sorting 10,000 random numbers.
13.9. IN-PLACE MERGE SORT 349

void naive_merge(Key∗ xs, int l, int m, int u) {

int i; Key y;
for(; l < m && m < u; ++l)
if (!(xs[l] < xs[m])) {
y = xs[m++];
for (i = m - 1; i > l; --i) /∗ shift ∗/
xs[i] = xs[i-1];
xs[l] = y;
}
}

void msort3(Key∗ xs, int l, int u) {

int m;
if (u - l > 1) {
m = l + (u - l) / 2;
msort3(xs, l, m);
msort3(xs, m, u);
naive_merge(xs, l, m, u);
}
}

13.9.2 in-place working area

In order to implement the in-place merge sort in O(n lg n) time, when sorting a sub array,
the rest part of the array must be reused as working area for merging. As the elements
stored in the working area, will be sorted later, they can’t be overwritten. We can modify
the previous algorithm, which duplicates extra spaces for merging, a bit to achieve this.
The idea is that, every time when we compare the first elements in the two sorted sub
arrays, if we want to put the less element to the target position in the working area, we
in-turn exchange what sored in the working area with this element. Thus after merging,
the two sub arrays store what the working area previously contains. This idea can be
illustrated in figure 13.12.

compare

... reuse ... A[i] ... ... reuse ... B[j] ...

swap(A[i], C[k]) if A[i] < B[j]

... merged ... C[k] ...

Figure 13.12: Merge without overwriting working area.

In our algorithm, both the two sorted sub arrays, and the working area for merging
are parts of the original array to be sorted. we need supply the following arguments
when merging: the start points and end points of the sorted sub arrays, which can be
represented as ranges; and the start point of the working area. The following algorithm
for example, uses [a, b) to indicate the range include a, exclude b. It merges sorted range
[i, m) and range [j, n) to the working area starts from k.
1: procedure Merge(A, [i, m), [j, n), k)
350 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

2: while i < m ∧ j < n do

3: if A[i] < A[j] then
4: Exchange A[k] ↔ A[i]
5: i←i+1
6: else
7: Exchange A[k] ↔ A[j]
8: j ←j+1
9: k ←k+1
10: while i < m do
11: Exchange A[k] ↔ A[i]
12: i←i+1
13: k ←k+1
14: while j < m do
15: Exchange A[k] ↔ A[j]
16: j ←j+1
17: k ←k+1
Note that, the following two constraints must be satisfied when merging:

1. The working area should be within the bounds of the array. In other words, it
should be big enough to hold elements exchanged in without causing any out-of-
bound error;
2. The working area can be overlapped with either of the two sorted arrays, however,
it should be ensured that there are not any unmerged elements being overwritten;

This algorithm can be implemented in ANSI C as the following example.

void wmerge(Key∗ xs, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(xs, w++, xs[i] < xs[j] ? i++ : j++);
while (i < m)
swap(xs, w++, i++);
while (j < n)
swap(xs, w++, j++);
}

With this merging algorithm defined, it’s easy to imagine a solution, which can sort
half of the array; The next question is, how to deal with the rest of the unsorted part
stored in the working area as shown in figure 13.13?

...unsorted... ... sorted ...

Figure 13.13: Half of the array is sorted.

One intuitive idea is to recursively sort another half of the working area, thus there
are only 14 elements haven’t been sorted yet. Which is shown in figure 13.14. The key
point at this stage is that we must merge the sorted 14 elements B with the sorted 21
elements A sooner or later.
Is the working area left, which only holds 14 elements, big enough for merging A and
B? Unfortunately, it isn’t in the settings shown in figure 13.14.
However, the second constraint mentioned before gives us a hint, that we can exploit
it by arranging the working area to overlap with either sub array if we can ensure the
unmerged elements won’t be overwritten under some well designed merging schema.
13.9. IN-PLACE MERGE SORT 351

unsorted 1/4 sorted B 1/4 ... ... sorted A 1/2 ... ...

Figure 13.14: A and B must be merged at sometime.

Actually, instead of making the second half of the working area be sorted, we can
make the first half be sorted, and put the working area between the two sorted arrays as
shown in figure 13.15 (a). This setup effects arranging the working area to overlap with
the sub array A. This idea is proposed in [74].

sorted B 1/4 work area ... ... sorted A 1/2 ... ...

(a)

work area 1/4 ... ... ... ... merged 3/4 ... ... ... ...

(b)

Figure 13.15: Merge A and B with the working area.

Let’s consider two extreme cases:

1. All the elements in B are less than any element in A. In this case, the merge
algorithm finally moves the whole contents of B to the working area; the cells of B
holds what previously stored in the working area; As the size of area is as same as
B, it’s OK to exchange their contents;
2. All the elements in A are less than any element in B. In this case, the merge
algorithm continuously exchanges elements between A and the working area. After
all the previous 14 cells in the working area are filled with elements from A, the
algorithm starts to overwrite the first half of A. Fortunately, the contents being
overwritten are not those unmerged elements. The working area is in effect advances
toward the end of the array, and finally moves to the right side; From this time point,
the merge algorithm starts exchanging contents in B with the working area. The
result is that the working area moves to the left most side which is shown in figure
13.15 (b).

We can repeat this step, that always sort the second half of the unsorted part, and
exchange the sorted sub array to the first half as working area. Thus we keep reducing
the working area from 12 of the array, 14 of the array, 18 of the array, ... The scale of
the merge problem keeps reducing. When there is only one element left in the working
area, we needn’t sort it any more since the singleton array is sorted by nature. Merging a
singleton array to the other is equivalent to insert the element. In practice, the algorithm
can finalize the last few elements by switching to insertion sort.
The whole algorithm can be described as the following.
1: procedure Sort(A, l, u)
2: if u − l > 0 then
3: 2 c
m ← b l+u
4: w ←l+u−m
5: Sort’(A, l, m, w) ▷ The second half contains sorted elements
352 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

6: while w − l > 1 do
7: u′ ← w
′
8: w ← d l+u
2 e ▷ Ensure the working area is big enough
9: Sort’(A, w, u′ , l) ▷ The first half holds the sorted elements
10: Merge(A, [l, l + u′ − w], [u′ , u], w)
11: for i ← w down-to l do ▷ Switch to insertion sort
12: j←i
13: while j ≤ u ∧ A[j] < A[j − 1] do
14: Exchange A[j] ↔ A[j − 1]
15: j ←j+1
Note that in order to satisfy the first constraint, we must ensure the working area is
big enough to hold all exchanged in elements, that’s way we round it by ceiling when sort
the second half of the working area. Note that we actually pass the ranges including the
end points to the algorithm Merge.
Next, we develop a Sort’ algorithm, which mutually recursive call Sort and exchange
the result to the working area.
1: procedure Sort’(A, l, u, w)
2: if u − l > 0 then
3: 2 c
m ← b l+u
4: Sort(A, l, m)
5: Sort(A, m + 1, u)
6: Merge(A, [l, m], [m + 1, u], w)
7: else ▷ Exchange all elements to the working area
8: while l ≤ u do
9: Exchange A[l] ↔ A[w]
10: l ←l+1
11: w ←w+1
Different from the naive in-place sort, this algorithm doesn’t shift the array during
n n n
merging. The main algorithm reduces the unsorted part in sequence of , , , ..., it
2 4 8
takes O(lg n) steps to complete sorting. In every step, It recursively sorts half of the rest
elements, and performs linear time merging.
Denote the time cost of sorting n elements as T (n), we have the following equation.
n n n 3n n 7n
T (n) = T ( ) + c + T ( ) + c + T( ) + c + ... (13.34)
2 2 4 4 8 8
substitute n with its half, we get another one:
n n n n 3n n 7n
T( ) = T( ) + c + T( ) + c + T( ) + c + ... (13.35)
2 4 4 8 8 16 16
Substract (13.34) and (13.35) we have:
n n 1 1
T (n) − T ( ) = T ( ) + cn( + + ...)
2 2 2 2
1
There are total lg n times added together, therefore, the recursive time can be
2
expressed as:
1
T (n) = 2T ( ) + cn lg n
2
Solving this equation by using telescope method, gets the result O(n lg2 n).
The following ANSI C code completes the implementation by using the example
wmerge program given above.
13.9. IN-PLACE MERGE SORT 353

void imsort(Key∗ xs, int l, int u);

void wsort(Key∗ xs, int l, int u, int w) {

int m;
if (u - l > 1) {
m = l + (u - l) / 2;
imsort(xs, l, m);
imsort(xs, m, u);
wmerge(xs, l, m, m, u, w);
}
else
while (l < u)
swap(xs, l++, w++);
}

void imsort(Key∗ xs, int l, int u) {

int m, n, w;
if (u - l > 1) {
m = l + (u - l) / 2;
w = l + u - m;
wsort(xs, l, m, w); /∗ the last half contains sorted elements ∗/
while (w - l > 2) {
n = w;
w = l + (n - l + 1) / 2; /∗ ceiling ∗/
wsort(xs, w, n, l); /∗ the first half contains sorted elements ∗/
wmerge(xs, l, l + n - w, n, u, w);
}
for (n = w; n > l; --n) /∗switch to insertion sort∗/
for (m = n; m < u && xs[m] < xs[m-1]; ++m)
swap(xs, m, m - 1);
}
}

However, this program doesn’t run faster than the version we developed in previous
section, which doubles the array in advance as working area. In my machine, it is about
60% slower when sorting 100,000 random numbers due to many swap operations.

13.9.3 In-place merge sort vs. linked-list merge sort

The in-place merge sort is still a live area for research. In order to save the extra spaces
for merging, some overhead has be introduced, which increases the complexity of the
merge sort algorithm. However, if the underlying data structure isn’t array, but linked-
list, merge can be achieved without any extra spaces as shown in the even-odd functional
merge sort algorithm presented in previous section.
In order to make it clearer, we can develop a purely imperative linked-list merge sort
solution. The linked-list can be defined as a record type as shown in appendix A like
below.
struct Node {
Key key;
struct Node∗ next;
};

We can define an auxiliary function for node linking. Assume the list to be linked
isn’t empty, it can be implemented as the following.
struct Node∗ link(struct Node∗ x, struct Node∗ ys) {
x→next = ys;
return x;
}
354 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

One method to realize the imperative even-odd splitting, is to initialize two empty
sub lists. Then iterate the list to be split. Every time, we link the current node in front
of the first sub list, then exchange the two sub lists. So that, the second sub list will be
linked at the next time iteration. This idea can be illustrated as below.
1: function Split(L)
2: (A, B) ← (ϕ, ϕ)
3: while L 6= ϕ do
4: p←L
5: L ← Next(L)
6: A ← Link(p, A)
7: Exchange A ↔ B
8: return (A, B)
The following example ANSI C program implements this splitting algorithm embed-
ded.
struct Node∗ msort(struct Node∗ xs) {
struct Node ∗p, ∗as, ∗bs;
if (!xs | | !xs→next) return xs;

as = bs = NULL;
while(xs) {
p = xs;
xs = xs→next;
as = link(p, as);
swap(as, bs);
}
as = msort(as);
bs = msort(bs);
return merge(as, bs);
}

The only thing left is to develop the imperative merging algorithm for linked-list. The
idea is quite similar to the array merging version. As long as neither of the sub lists is
exhausted, we pick the less one, and append it to the result list. After that, it just need
link the non-empty one to the tail the result, but not a looping for copying. It needs some
carefulness to initialize the result list, as its head node is the less one among the two sub
lists. One simple method is to use a dummy sentinel head, and drop it before returning.
This implementation detail can be given as the following.
struct Node∗ merge(struct Node∗ as, struct Node∗ bs) {
struct Node s, ∗p;
p = &s;
while (as && bs) {
if (as→key < bs→key) {
link(p, as);
as = as→next;
}
else {
link(p, bs);
bs = bs→next;
}
p = p→next;
}
if (as)
link(p, as);
if (bs)
link(p, bs);
return [Link];
}
13.10. NATURE MERGE SORT 355

Exercise 13.5

• Proof the performance of in-place merge sort is bound to O(n lg n).

13.10 Nature merge sort

Knuth gives another way to interpret the idea of divide and conquer merge sort. It just
likes burn a candle in both ends [51]. This leads to the nature merge sort algorithm.

Figure 13.16: Burn a candle from both ends

For any given sequence, we can always find a non-decreasing sub sequence starts at
any position. One particular case is that we can find such a sub sequence from the left-
most position. The following table list some examples, the non-decreasing sub sequences
are in bold font.
15 , 0, 4, 3, 5, 2, 7, 1, 12, 14, 13, 8, 9, 6, 10, 11
8, 12, 14 , 0, 1, 4, 11, 2, 3, 5, 9, 13, 10, 6, 15, 7
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
The first row in the table illustrates the worst case, that the second element is less than
the first one, so the non-decreasing sub sequence is a singleton list, which only contains
the first element; The last row shows the best case, the the sequence is ordered, and the
non-decreasing list is the whole; The second row shows the average case.
Symmetrically, we can always find a non-decreasing sub sequence from the end of the
sequence to the left. This indicates us that we can merge the two non-decreasing sub
sequences, one from the beginning, the other form the ending to a longer sorted sequence.
The advantage of this idea is that, we utilize the nature ordered sub sequences, so that
we needn’t recursive sorting at all.

8, 12, 14 0, 1, 4, 11 2, 3, 5 9 13, 10, 6 15, 7

merge merge

7, 8, 12, 14, 15 ... free cells ... 13, 11, 10, 6, 4, 1, 0

Figure 13.17: Nature merge sort

Figure 13.17 illustrates this idea. We starts the algorithm by scanning from both ends,
finding the longest non-decreasing sub sequences respectively. After that, these two sub
356 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

sequences are merged to the working area. The merged result starts from beginning. Next
we repeat this step, which goes on scanning toward the center of the original sequence.
This time we merge the two ordered sub sequences to the right hand of the working area
toward the left. Such setup is easy for the next round of scanning. When all the elements
in the original sequence have been scanned and merged to the target, we switch to use
the elements stored in the working area for sorting, and use the previous sequence as new
working area. Such switching happens repeatedly in each round. Finally, we copy all
elements from the working area to the original array if necessary.
The only question left is when this algorithm stops. The answer is that when we start
a new round of scanning, and find that the longest non-decreasing sub list spans to the
end, which means the whole list is ordered, the sorting is done.
Because this kind of merge sort proceeds the target sequence in two ways, and uses
the nature ordering of sub sequences, it’s named nature two-way merge sort. In order to
realize it, some carefulness must be paid. Figure 13.18 shows the invariant during the
nature merge sort. At anytime, all elements before marker a and after marker d have been
already scanned and merged. We are trying to span the non-decreasing sub sequence [a, b)
as long as possible, at the same time, we span the sub sequence from right to left to span
[c, d) as long as possible as well. The invariant for the working area is shown in the second
row. All elements before f and after r have already been processed (Note that they may
contain several ordered sub sequences). For the odd times (1, 3, 5, ...), we merge [a, b)
and [c, d) from f toword right; while for the even times (2, 4, 6, ...), we merge the two
sorted sub sequences after r toward left.

a b c d

... scanned ... ... span [a, b) ... ... ? ... ... span [c, d) ... ... scanned ...

f r

... merged ... ... unused free cells ... ... merged ...

Figure 13.18: Invariant during nature merge sort

For imperative realization, the sequence is represented by array. Before sorting starts,
we duplicate the array to create a working area. The pointers a, b are initialized to point
the left most position, while c, d point to the right most position. Pointer f starts by
pointing to the front of the working area, and r points to the rear position.
1: function Sort(A)
2: if |A| > 1 then
3: n ← |A|
4: B ← Create-Array(n) ▷ Create the working area
5: loop
13.10. NATURE MERGE SORT 357

6: [a, b) ← [1, 1)
7: [c, d) ← [n + 1, n + 1)
8: f ← 1, r ← n ▷ front and rear pointers to the working area
9: t ← False ▷ merge to front or rear
10: while b < c do ▷ There are still elements for scan
11: repeat ▷ Span [a, b)
12: b←b+1
13: until b ≥ c ∨ A[b] < A[b − 1]
14: repeat ▷ Span [c, d)
15: c←c−1
16: until c ≤ b ∨ A[c − 1] < A[c]
17: if c < b then ▷ Avoid overlap
18: c←b
19: if b − a ≥ n then ▷ Done if [a, b) spans to the whole array
20: return A
21: if t then ▷ merge to front
22: f ← Merge(A, [a, b), [c, d), B, f, 1)
23: else ▷ merge to rear
24: r ← Merge(A, [a, b), [c, d), B, r, −1)
25: a ← b, d ← c
26: t ← ¬t ▷ Switch the merge direction
27: Exchange A ↔ B ▷ Switch working area
28: return A
The merge algorithm is almost as same as before except that we need pass a parameter
to indicate the direction for merging.
1: function Merge(A, [a, b), [c, d), B, w, ∆)
2: while a < b ∧ c < d do
3: if A[a] < A[d − 1] then
4: B[w] ← A[a]
5: a←a+1
6: else
7: B[w] ← A[d − 1]
8: d←d−1
9: w ←w+∆
10: while a < b do
11: B[w] ← A[a]
12: a←a+1
13: w ←w+∆
14: while c < d do
15: B[w] ← A[d − 1]
16: d←d−1
17: w ←w+∆
18: return w
The following ANSI C program implements this two-way nature merge sort algorithm.
Note that it doesn’t release the allocated working area explicitly.

int merge(Key∗ xs, int a, int b, int c, int d, Key∗ ys, int k, int delta) {
for(; a < b && c < d; k += delta )
ys[k] = xs[a] < xs[d-1] ? xs[a++] : xs[--d];
for(; a < b; k += delta)
ys[k] = xs[a++];
for(; c < d; k += delta)
358 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

ys[k] = xs[--d];
return k;
}

Key∗ sort(Key∗ xs, Key∗ ys, int n) {

int a, b, c, d, f, r, t;
if(n < 2)
return xs;
for(;;) {
a = b = 0;
c = d = n;
f = 0;
r = n-1;
t = 1;
while(b < c) {
do { /∗ span [a, b) as much as possible ∗/
++b;
} while( b < c && xs[b-1] ≤ xs[b] );
do{ /∗ span [c, d) as much as possible ∗/
--c;
} while( b < c && xs[c] ≤ xs[c-1] );
if( c < b )
c = b; /∗ eliminate overlap if any ∗/
if( b - a ≥ n)
return xs; /∗ sorted ∗/
if( t )
f = merge(xs, a, b, c, d, ys, f, 1);
else
r = merge(xs, a, b, c, d, ys, r, -1);
a = b;
d = c;
t = !t;
}
swap(&xs, &ys);
}
return xs; /∗can't be here∗/
}

The performance of nature merge sort depends on the actual ordering of the sub arrays.
However, it in fact performs well even in the worst case. Suppose that we are unlucky
when scanning the array, that the length of the non-decreasing sub arrays are always 1
during the first round scan. This leads to the result working area with merged ordered
sub arrays of length 2. Suppose that we are unlucky again in the second round of scan,
however, the previous results ensure that the non-decreasing sub arrays in this round are
no shorter than 2, this time, the working area will be filled with merged ordered sub arrays
of length 4, ... Repeat this we get the length of the non-decreasing sub arrays doubled
in every round, so there are at most O(lg n) rounds, and in every round we scanned all
the elements. The overall performance for this worst case is bound to O(n lg n). We’ll go
back to this interesting phenomena in the next section about bottom-up merge sort.
In purely functional settings however, it’s not sensible to scan list from both ends since
the underlying data structure is singly linked-list. The nature merge sort can be realized
in another approach.
Observe that the list to be sorted is consist of several non-decreasing sub lists, that we
can pick every two of such sub lists and merge them to a bigger one. We repeatedly pick
and merge, so that the number of the non-decreasing sub lists halves continuously and
finally there is only one such list, which is the sorted result. This idea can be formalized
in the following equation.
sort(L) = sort′ (group(L)) (13.36)
Where function group(L) groups the elements in the list into non-decreasing sub lists.
13.10. NATURE MERGE SORT 359

This function can be described like below, the first two are trivial edge cases.
• If the list is empty, the result is a list contains an empty list;
• If there is only one element in the list, the result is a list contains a singleton list;
• Otherwise, The first two elements are compared, if the first one is less than or equal
to the second, it is linked in front of the first sub list of the recursive grouping
result; or a singleton list contains the first element is set as the first sub list before
the recursive result.

 {L} : |L| ≤ 1
group(L) = {{l1 } ∪ L1 , L2 , ...} : l1 ≤ l2 , {L1 , L2 , ...} = group(L′ ) (13.37)

{{l1 }, L1 , L2 , ...} : otherwise
It’s quite possible to abstract the grouping criteria as a parameter to develop a generic
grouping function, for instance, as the following Haskell code 6 .
groupBy' :: (a→a→Bool) →[a] →[[a]]
groupBy' _ [] = [[]]
groupBy' _ [x] = [[x]]
groupBy' f (x:xs@(x':_)) | f x x' = (x:ys):yss
| otherwise = [x]:r
where
r@(ys:yss) = groupBy' f xs

Different from the sort function, which sorts a list of elements, function sort′ accepts
a list of sub lists which is the result of grouping.

 ϕ : L=ϕ
sort′ (L) = L1 : L = {L1 } (13.38)

sort′ (mergeP airs(L)) : otherwise
The first two are the trivial edge cases. If the list to be sorted is empty, the result is
obviously empty; If it contains only one sub list, then we are done. We need just extract
this single sub list as result; For the recursive case, we call a function mergeP airs to
merge every two sub lists, then recursively call sort′ .
The next undefined function is mergeP airs, as the name indicates, it repeatedly
merges pairs of non-decreasing sub lists into bigger ones.
{
L : |L| ≤ 1
mergeP airs(L) = (13.39)
{merge(L1 , L2 )} ∪ mergeP airs(L′′ ) : otherwise
When there are less than two sub lists in the list, we are done; otherwise, we merge
the first two sub lists L1 and L2 , and recursively merge the rest of pairs in L′′ . The type
of the result of mergeP airs is list of lists, however, it will be flattened by sort′ function
finally.
The merge function is as same as before. The complete example Haskell program is
given as below.
mergesort = sort' ◦ groupBy' ( ≤ )

sort' [] = []
sort' [xs] = xs
sort' xss = sort' (mergePairs xss) where
mergePairs (xs:ys:xss) = merge xs ys : mergePairs xss
mergePairs xss = xss

6 There is a ‘groupBy’ function provided in the Haskell standard library ’[Link]’. However, it doesn’t

fit here, because it accepts an equality testing function as parameter, which must satisfy the properties
of reflexive, transitive, and symmetric. but what we use here, the less-than or equal to operation doesn’t
conform to symetric. Refer to appendix A of this book for detail.
360 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

Alternatively, observing that we can first pick two sub lists, merge them to an inter-
mediate result, then repeatedly pick next sub list, and merge to this ordered result we’ve
gotten so far until all the rest sub lists are merged. This is a typical folding algorithm as
introduced in appendix A.

sort(L) = f old(merge, ϕ, group(L)) (13.40)

Translate this version to Haskell yields the folding version.

mergesort' = foldl merge [] ◦ groupBy' ( ≤ )

Exercise 13.6

• Is the nature merge sort algorithm realized by folding is equivalent with the one by
using mergeP airs in terms of performance? If yes, prove it; If not, which one is
faster?

13.11 Bottom-up merge sort

The worst case analysis for nature merge sort raises an interesting topic, instead of re-
alizing merge sort in top-down manner, we can develop a bottom-up version. The great
advantage is that, we needn’t do book keeping any more, so the algorithm is quite friendly
for purely iterative implementation.
The idea of bottom-up merge sort is to turn the sequence to be sorted into n small
sub sequences each contains only one element. Then we merge every two of such small
sub sequences, so that we get n2 ordered sub sequences each with length 2; If n is odd
number, we left the last singleton sequence untouched. We repeatedly merge these pairs,
and finally we get the sorted result. Knuth names this variant as ‘straight two-way merge
sort’ [51]. The bottom-up merge sort is illustrated in figure 13.19

...

... ...

...

Figure 13.19: Bottom-up merge sort

13.11. BOTTOM-UP MERGE SORT 361

Different with the basic version and even-odd version, we needn’t explicitly split the
list to be sorted in every recursion. The whole list is split into n singletons at the very
beginning, and we merge these sub lists in the rest of the algorithm.

sort(L) = sort′ (wraps(L)) (13.41)

{
ϕ : L=ϕ
wraps(L) = (13.42)
{{l1 }} ∪ wraps(L′ ) : otherwise

Of course wraps can be implemented by using mapping as introduced in appendix A.

sort(L) = sort′ (map(λx · {x}, L)) (13.43)

We reuse the function sort′ and mergeP airs which are defined in section of nature
merge sort. They repeatedly merge pairs of sub lists until there is only one.
Implement this version in Haskell gives the following example code.
sort = sort' ◦ map (λx→[x])

This version is based on what Okasaki presented in [3]. It is quite similar to the nature
merge sort only differs in the way of grouping. Actually, it can be deduced as a special
case (the worst case) of nature merge sort by the following equation.

sort(L) = sort′ (groupBy(λx,y · F alse, L)) (13.44)

That instead of spanning the non-decreasing sub list as long as possible, the predicate
always evaluates to false, so the sub list spans only one element.
Similar with nature merge sort, bottom-up merge sort can also be defined by folding.
The detailed implementation is left as exercise to the reader.
Observing the bottom-up sort, we can find it’s in tail-recursion call manner, thus it’s
quite easy to translate into purely iterative algorithm without any recursion.
1: function Sort(A)
2: B←ϕ
3: for ∀a ∈ A do
4: B ← Append({a})
5: N ← |B|
6: while N > 1 do
7: for i ← from 1 to b N2 c do
8: B[i] ← Merge(B[2i − 1], B[2i])
9: if Odd(N ) then
10: B[d N2 e] ← B[N ]
11: N ← d N2 e
12: if B = ϕ then
13: return ϕ
14: return B[1]
The following example Python program implements the purely iterative bottom-up
merge sort.
def mergesort(xs):
ys = [[x] for x in xs]
while len(ys) > 1:
[Link](merge([Link](0), [Link](0)))
return [] if ys == [] else [Link]()

def merge(xs, ys):

362 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT

zs = []
while xs ̸= [] and ys ̸= []:
[Link]([Link](0) if xs[0] < ys[0] else [Link](0))
return zs + (xs if xs ̸= [] else ys)

The Python implementation combines multiple rounds of merging by consuming the

pair of lists on the head, and appending the merged result to the tail. This greatly simply
the logic of handling odd sub lists case as shown in the above pseudo code.

Exercise 13.7

• Implement the functional bottom-up merge sort by using folding.

• Implement the iterative bottom-up merge sort only with array indexing. Don’t use
any library supported tools, such as list, vector etc.

13.12 Parallelism
We mentioned in the basic version of quick sort, that the two sub sequences can be sorted
in parallel after the divide phase finished. This strategy is also applicable for merge
sort. Actually, the parallel version quick sort and morege sort, do not only distribute the
recursive sub sequences sorting into two parallel processes, but divide the sequences into
p sub sequences, where p is the number of processors. Idealy, if we can achieve sorting in
T ′ time with parallelism, which satisifies O(n lg n) = pT ′ . We say it is linear speed up,
and the algorithm is parallel optimal.
However, a straightforward parallel extension to the sequential quick sort algorithm
which samples several pivots, divides p sub sequences, and independently sorts them in
parallel, isn’t optimal. The bottleneck exists in the divide phase, which we can only
achieve O(n) time in average case.
The straightforward parallel extension to merge sort, on the other hand, block at the
merge phase. Both parallel merge sort and quick sort in practice need good designs in
order to achieve the optimal speed up. Actually, the divide and conquer nature makes
merge sort and quick sort relative easy for parallelisim. Richard Cole found the O(lg n)
parallel merge sort algorithm with n processors in 1986 in [76].
Parallelism is a big and complex topic which is out of the scope of this elementary
book. Readers can refer to [76] and [77] for details.

13.13 Short summary

In this chapter, two popular divide and conquer sorting methods, quick sort and merge sort
are introduced. Both of them meet the upper performance limit of the comparison based
sorting algorithms O(n lg n). Sedgewick said that quick sort is the greatest algorithm
invented in the 20th century. Almost all programming environments adopt quick sort as
the default sorting tool. As time goes on, some environments, especially those manipulate
abstract sequence which is dynamic and not based on pure array switch to merge sort as
the general purpose sorting tool7 .
The reason for this interesting phenomena can be partly explained by the treatment
in this chapter. That quick sort performs perfectly in most cases, it needs fewer swapping
than most other algorithms. However, the quick sort algorithm is based on swapping, in
7 Actually, most of them are kind of hybrid sort, balanced with insertion sort to achieve good perfor-

mance when the sequence is short

13.13. SHORT SUMMARY 363

purely functional settings, swapping isn’t the most eﬀicient way due to the underlying data
structure is singly linked-list, but not vectorized array. Merge sort, on the other hand,
is friendly in such environment, as it costs constant spaces, and the performance can be
ensured even in the worst case of quick sort, while the latter downgrade to quadratic time.
However, merge sort doesn’t performs as well as quick sort in purely imperative settings
with arrays. It either needs extra spaces for merging, which is sometimes unreasonable,
for example in embedded system with limited memory, or causes many overhead swaps
by in-place workaround. In-place merging is till an active research area.
Although the title of this chapter is ‘quick sort vs. merge sort’, it’s not the case
that one algorithm has nothing to do with the other. Quick sort can be viewed as the
optimized version of tree sort as explained in this chapter. Similarly, merge sort can also
be deduced from tree sort as shown in [75].
There are many ways to categorize sorting algorithms, such as in [51]. One way is to
from the point of view of easy/hard partition, and easy/hard merge [72].
Quick sort, for example, is quite easy for merging, because all the elements in the sub
sequence before the pivot are no greater than any one after the pivot. The merging for
quick sort is actually trivial sequence concatenation.
Merge sort, on the other hand, is more complex in merging than quick sort. However,
it’s quite easy to divide no matter what concrete divide method is taken: simple divide
at the middle point, even-odd splitting, nature splitting, or bottom-up straight splitting.
Compare to merge sort, it’s more diﬀicult for quick sort to achieve a perfect dividing.
We show that in theory, the worst case can’t be completely avoided, no matter what
engineering practice is taken, median-of-three, random quick sort, 3-way partition etc.
We’ve shown some elementary sorting algorithms in this book till this chapter, includ-
ing insertion sort, tree sort, selection sort, heap sort, quick sort and merge sort. Sorting
is still a hot research area in computer science. At the time when this chapter is written,
people are challenged by the buzz word ‘big data’, that the traditional convenient method
can’t handle more and more huge data within reasonable time and resources. Sorting a
sequence of hundreds of Gigabytes becomes a routine in some fields.

Exercise 13.8

• Design an algorithm to create binary search tree by using merge sort strategy.
364 CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
Bibliography

[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. ISBN:0262032937. The MIT Press. 2001

[3] Robert Sedgewick. “Implementing quick sort programs”. Communication of ACM.

Volume 21, Number 10. 1978. pp.847 - 857.

[4] Jon Bentley. “Programming pearls, Second Edition”. Addison-Wesley Professional;

1999. ISBN-13: 978-0201657883

[5] Jon Bentley, Douglas McIlroy. “Engineering a sort function”. Software Practice and
experience VOL. 23(11), 1249-1265 1993.

[6] Robert Sedgewick, Jon Bentley. “Quicksort is optimal”.

[Link] rs/talks/[Link]

[7] Richard Bird. “Pearls of functional algorithm design”. Cambridge University Press.
2010. ISBN, 1139490605, 9781139490603

[8] Fethi Rabhi, Guy Lapalme. “Algorithms: a functional programming approach”. Sec-
ond edition. Addison-Wesley, 1999. ISBN: 0201-59604-0

[9] Simon Peyton Jones. “The Implementation of functional programming languages”.

Prentice-Hall International, 1987. ISBN: 0-13-453333-X

[10] Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. “Practical in-place mergesort”.
Nordic Journal of Computing, 1996.

[11] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502

[12] Josè Bacelar Almeida and Jorge Sousa Pinto. “Deriving Sorting Algorithms”. Tech-
nical report, Data structures and Algorithms. 2008.

[13] Cole, Richard (August 1988). “Parallel merge sort”. SIAM J. Comput. 17 (4):
770¨C785. doi:10.1137/0217049. (August 1988)

[14] Powers, David M. W. “Parallelized Quicksort and Radixsort with Optimal Speedup”,
Proceedings of International Conference on Parallel Computing Technologies. Novosi-
birsk. 1991.

[15] Wikipedia. “Quicksort”. [Link]

365
366 Searching

[16] Wikipedia. “Strict weak order”. [Link]

[17] Wikipedia. “Total order”. [Link]

[18] Wikipedia. “Harmonic series (mathematics)”. [Link]
Chapter 14

Searching

14.1 Introduction
Searching is quite a big and important area. Computer makes many hard searching
problems realistic. They are almost impossible for human beings. A modern industry
robot can even search and pick the correct gadget from the pipeline for assembly; A GPS
car navigator can search among the map, for the best route to a specific place. The
modern mobile phone is not only equipped with such map navigator, but it can also
search for the best price for Internet shopping.
This chapter just scratches the surface of elementary searching. One good thing that
computer offers is the brute-force scanning for a certain result in a large sequence. The
divide and conquer search strategy will be briefed with two problems, one is to find the
k-th big one among a list of unsorted elements; the other is the popular binary search
among a list of sorted elements. We’ll also introduce the extension of binary search for
multiple-dimension data.
Text matching is also very important in our daily life, two well-known searching al-
gorithms, Knuth-Morris-Pratt (KMP) and Boyer-Moore algorithms will be introduced.
They set good examples for another searching strategy: information reusing.
Besides sequence search, some elementary methods for searching solution for some
interesting problems will be introduced. They were mostly well studied in the early
phase of AI (artificial intelligence), including the basic DFS (Depth first search), and
BFS (Breadth first search).
Finally, Dynamic programming will be briefed for searching optimal solutions, and
we’ll also introduce about greedy algorithm which is applicable for some special cases.
All algorithms will be realized in both imperative and functional approaches.

14.2 Sequence search

Although modern computer offers fast speed for brute-force searching, and even if the
Moore’s law could be strictly followed, the grows of huge data is too fast to be handled
well in this way. We’ve seen a vivid example in the introduction chapter of this book.
It’s why people study the computer search algorithms.

14.2.1 Divide and conquer search

One solution is to use divide and conquer approach. That if we can repeatedly scale
down the search domain, the data being dropped needn’t be examined at all. This will

367
368 CHAPTER 14. SEARCHING

definitely speed up the search.

k-selection problem
Consider a problem of finding the k-th smallest one among n elements. The most straight-
forward idea is to find the minimum first, then drop it and find the second minimum
element among the rest. Repeat this minimum finding and dropping k steps will give the
k-th smallest one. Finding the minimum among n elements costs linear O(n) time. Thus
this method performs O(kn) time, if k is much smaller than n.
Another method is to use the ‘heap’ data structure we’ve introduced. No matter what
concrete heap is used, e.g. binary heap with implicit array, Fibonacci heap or others,
Accessing the top element followed by popping is typically bound O(lg n) time. Thus this
method, as formalized in equation (14.1) and (14.2) performs in O(k lg n) time, if k is
much smaller than n.

top(k, L) = f ind(k, heapif y(L)) (14.1)

{
top(H) : k = 0
f ind(k, H) = (14.2)
f ind(k − 1, pop(H)) : otherwise

However, heap adds some complexity to the solution. Is there any simple, fast method
to find the k-th element?
The divide and conquer strategy can help us. If we can divide all the elements into
two sub lists A and B, and ensure all the elements in A is not greater than any elements
in B, we can scale down the problem by following this method1 :

1. Compare the length of sub list A and k;

2. If k < |A|, the k-th smallest one must be contained in A, we can drop B and further
search in A;

3. If |A| < k, the k-th smallest one must be contained in B, we can drop A and further
search the (k − |A|)-th smallest one in B.

Note that the italic font emphasizes the fact of recursion. The ideal case always divides
the list into two equally big sub lists A and B, so that we can halve the problem each
time. Such ideal case leads to a performance of O(n) linear time.
Thus the key problem is how to realize dividing, which collects the first m smallest
elements in one sub list, and put the rest in another.
This reminds us the partition algorithm in quick sort, which moves all the elements
smaller than the pivot in front of it, and moves those greater than the pivot behind it.
Based on this idea, we can develop a divide and conquer k-selection algorithm, which is
called quick selection algorithm.

1. Randomly select an element (the first for instance) as the pivot;

2. Moves all elements which aren’t greater than the pivot in a sub list A; and moves
the rest to sub list B;

3. Compare the length of A with k, if |A| = k − 1, then the pivot is the k-th smallest
one;

4. If |A| > k − 1, recursively find the k-th smallest one among A;

1 This actually demands a more accurate definition of the k-th smallest in L: It’s equal to the k-the

element of L′ , where L′ is a permutation of L, and L′ is in monotonic non-decreasing order.

14.2. SEQUENCE SEARCH 369

5. Otherwise, recursively find the (k − 1 − |A|)-th smallest one among B;

This algorithm can be formalized in below equation. Suppose 0 < k ≤ |L| , where L
is a non-empty list of elements. Denote l1 as the first element in L. It is chosen as the
pivot; L′ contains the rest elements except for l1 . (A, B) = partition(λx · x ≤ l1 , L′ ). It
partitions L′ by using the same algorithm defined in the chapter of quick sort.

 l1 : |A| = k − 1
top(k, L) = top(k − 1 − |A|, B) : |A| < k − 1 (14.3)

top(k, A) : otherwise

 (ϕ, ϕ) : L = ϕ
partition(p, L) = ({l1 } ∪ A, B) : p(l1 ), (A, B) = partition(p, L′ ) (14.4)

(A, {l1 } ∪ B) : ¬p(l1 )

The following Haskell example program implements this algorithm.

top n (x:xs) | len == n - 1 = x
| len < n - 1 = top (n - len - 1) bs
| otherwise = top n as
where
(as, bs) = partition ( ≤ x) xs
len = length as

The partition function is provided in Haskell standard library, the detailed implemen-
tation can be referred to previous chapter about quick sort.
The lucky case is that, the k-th smallest element is selected as the pivot at the very
beginning. The partition function examines the whole list, and finds that there are k − 1
elements not greater than the pivot, we are done in just O(n) time. The worst case is that
either the maximum or the minimum element is selected as the pivot every time. The
partition always produces an empty sub list, that either A or B is empty. If we always
pick the minimum as the pivot, the performance is bound to O(kn). If we always pick
the maximum as the pivot, the performance is O((n − k)n).
The best case (not the lucky case), is that the pivot always partition the list perfectly.
The length of A is nearly as same as the length of B. The list is halved every time. It
needs about O(lg n) partitions, each partition takes linear time proportion to the length
of the halved list. This can be expressed as O(n + n2 + n4 + ... + 2nm ), where m is the
smallest number satisfies 2nm < k. Summing the series leads to the result of O(n).
The average case analysis needs tool of mathematical expectation. It’s quite similar
to the proof given in previous chapter of quick sort. It’s left as an exercise to the reader.
Similar as quick sort, this divide and conquer selection algorithm performs well most
time in practice. We can take the same engineering practice such as media-of-three, or
randomly select the pivot as we did for quick sort. Below is the imperative realization for
example.
1: function Top(k, A, l, u)
2: Exchange A[l] ↔ A[ Random(l, u) ] ▷ Randomly select in [l, u]
3: p ← Partition(A, l, u)
4: if p − l + 1 = k then
5: return A[p]
6: if k < p − l + 1 then
7: return Top(k, A, l, p − 1)
8: return Top(k − p + l − 1, A, p + 1, u)
This algorithm searches the k-th smallest element in range of [l, u] for array A. The
boundaries are included. It first randomly selects a position, and swaps it with the first
370 CHAPTER 14. SEARCHING

one. Then this element is chosen as the pivot for partitioning. The partition algorithm
in-place moves elements and returns the position where the pivot being moved. If the
pivot is just located at position k, then we are done; if there are more than k − 1 elements
not greater than the pivot, the algorithm recursively searches the k-th smallest one in
range [l, p − 1]; otherwise, k is deduced by the number of elements before the pivot, and
recursively searches the range after the pivot [p + 1, u].
There are many methods to realize the partition algorithm, below one is based on N.
Lumoto’s method. Other realizations are left as exercises to the reader.
1: function Partition(A, l, u)
2: p ← A[l]
3: L←l
4: for R ← l + 1 to u do
5: if ¬(p < A[R]) then
6: L←L+1
7: Exchange A[L] ↔ A[R]
8: Exchange A[L] ↔ p
9: return L
Below ANSI C example program implements this algorithm. Note that it handles the
special case that either the array is empty, or k is out of the boundaries of the array. It
returns -1 to indicate the search failure.
int partition(Key∗ xs, int l, int u) {
int r, p = l;
for (r = l + 1; r < u; ++r)
if (!(xs[p] < xs[r]))
swap(xs, ++l, r);
swap(xs, p, l);
return l;
}

/∗ The result is stored in xs[k], returns k if u-l ≥ k, otherwise -1 ∗/

int top(int k, Key∗ xs, int l, int u) {
int p;
if (l < u) {
swap(xs, l, rand() % (u - l) + l);
p = partition(xs, l, u);
if (p - l + 1 == k)
return p;
return (k < p - l + 1) ? top(k, xs, l, p) :
top(k- p + l - 1, xs, p + 1, u);
}
return -1;
}

There is a method proposed by Blum, Floyd, Pratt, Rivest and Tarjan in 1973, which
ensures the worst case performance being bound to O(n) [4], [81]. It divides the list into
small groups. Each group contains no more than 5 elements. The median of each group
among these 5 elements are identified quickly. Then there are n5 median elements selected.
We repeat this step, and divide them again into groups of 5, and recursively select the
median of median. It’s obviously that the final ‘true’ median can be found in O(lg n)
time. This is the best pivot for partitioning the list. Next, we halve the list by this pivot
and recursively search for the k-th smallest one. The performance can be calculated as
the following.
n
T (n) = c1 lgn + c2 n + T ( ) (14.5)
2
Where c1 and c2 are constant factors for the median of median and partition compu-
tation respectively. Solving this equation with telescope method or the master theory in
14.2. SEQUENCE SEARCH 371

[4] gives the linear O(n) performance.

In case we just want to pick the top k smallest elements, but don’t care about the
order of them, the algorithm can be adjusted a little bit to fit.


 ϕ : k =0∨L=ϕ

A : |A| = k
tops(k, L) = (14.6)

 A ∪ {l1 } ∪ tops(k − |A| − 1, B) : |A| < k

tops(k, A) : otherwise

Where A, B have the same meaning as before that, (A, B) = partition(λx · x ≤ l1 , L′ )

if L isn’t empty. The relative example program in Haskell is given as below.
tops _ [] = []
tops 0 _ = []
tops n (x:xs) | len ==n = as
| len < n = as ++ [x] ++ tops (n-len-1) bs
| otherwise = tops n as
where
(as, bs) = partition ( ≤ x) xs
len = length as

binary search
Another popular divide and conquer algorithm is binary search. We’ve shown it in the
chapter about insertion sort. When I was in school, the teacher who taught math played
a magic to me, He asked me to consider a natural number less than 1000. Then he asked
me some questions, I only replied ‘yes’ or ‘no’, and finally he guessed my number. He
typically asked questions like the following:

• Is it an even number?

• Is it a prime number?

• Are all digits same?

• Can it be divided by 3?

• ...

Most of the time he guessed the number within 10 questions. My classmates and I all
thought it’s unbelievable.
This game will not be so interesting if it downgrades to a popular TV program, that
the price of a product is hidden, and you must figure out the exact price in 30 seconds.
The host of the program tells you if your guess is higher or lower to the fact. If you win,
the product is yours. The best strategy is to use similar divide and conquer approach to
perform a binary search. So it’s common to find such conversation between the player
and the host:

• P: 1000;

• H: High;

• P: 500;

• H: Low;

• P: 750;
372 CHAPTER 14. SEARCHING

• H: Low;

• P: 890;

• H: Low;

• P: 990;

• H: Bingo.

My math teacher told us that, because the number we considered is within 1000, if he
can halve the numbers every time by designing good questions, the number will be found
in 10 questions. This is because 210 = 1024 > 1000. However, it would be boring to just
ask it is higher than 500, is lower than 250, ... Actually, the question ‘is it even’ is very
good, because it always halve the numbers2 .
Come back to the binary search algorithm. It is only applicable to a sequence of
ordered number. I’ve seen programmers tried to apply it to unsorted array, and took
several hours to figure out why it doesn’t work. The idea is quite straightforward, in
order to find a number x in an ordered sequence A, we firstly check middle point number,
compare it with x, if they are same, then we are done; If x is smaller, as A is ordered,
we need only recursively search it among the first half; otherwise we search it among the
second half. Once A gets empty and we haven’t found x yet, it means x doesn’t exist.
Before formalizing this algorithm, there is a surprising fact need to be noted. Donald
Knuth stated that ‘Although the basic idea of binary search is comparatively straightfor-
ward, the details can be surprisingly tricky¡’. Jon Bentley pointed out that most binary
search implementation contains errors, and even the one given by him in the first version
of ‘Programming pearls’ contains an error undetected over twenty years [4].
There are two kinds of realization, one is recursive, the other is iterative. The recursive
solution is as same as what we described. Suppose the lower and upper boundaries of the
array are l and u inclusive.
1: function Binary-Search(x, A, l, u)
2: if u < l then
3: Not found error
4: else
5: 2 c
m ← l + b u−l ▷ avoid overflow of b l+u
2 c
6: if A[m] = x then
7: return m
8: if x < A[m] then
9: return Binary-Search(x, A, l, m - 1)
10: else
11: return Binary-Search(x, A, m + 1, u)
As the comment highlights, if the integer is represented with limited words, we can’t
merely use b l+u
2 c because it may cause overflow if l and u are big.
Binary search can also be realized in iterative manner, that we keep updating the
boundaries according to the middle point comparison result.
1: function Binary-Search(x, A, l, u)
2: while l < u do
3: m ← l + b u−l
2 c
4: if A[m] = x then
5: return m
2 When the author revise this chapter, Microsoft released a game in social networks. User can consider

a person’s name, the AI robot asks 16 questions next. The user only answers with yes or no. The robot
will tell you who is that person. Can you figure out how the robot works?
14.2. SEQUENCE SEARCH 373

6: if x < A[m] then

7: u←m−1
8: else
9: l ←m+1
return NIL
The implementation is very good exercise, we left it to the reader. Please try all kinds
of methods to verify your program.
Since the array is halved every time, the performance of binary search is bound to
O(lg n) time.
In purely functional settings, the list is represented with singly linked-list. It’s linear
time to randomly access the element for a given position. Binary search doesn’t make
sense in such case. However, it good to analyze what the performance will downgrade to.
Consider the following equation.


 Err : L = ϕ

b1 : x = b1 , (A, B) = splitAt(b |L|
2 c, L)
bsearch(x, L) =

 bsearch(x, A) : B = ϕ ∨ x < b
 1
bsearch(x, B ′ ) : otherwise

Where b1 is the first element if B isn’t empty, and B ′ holds the rest except for b1 .
The splitAt function takes O(n) time to divide the list into two subs A and B (see the
appendix A, and the chapter about merge sort for detail). If B isn’t empty and x is equal
to b1 , the search returns; Otherwise if it is less than b1 , as the list is sorted, we need
recursively search in A, otherwise, we search in B. If the list is empty, we raise error to
indicate search failure.
As we always split the list in the middle point, the number of elements halves in each
recursion. In every recursive call, we takes linear time for splitting. The splitting function
only traverses the first half of the linked-list, Thus the total time can be expressed as.
n n n
T (n) = c + c + c + ...
2 4 8
This results O(n) time, which is as same as the brute force search from head to tail:

 Err : L = ϕ
search(x, L) = l1 : x = l1

search(x, L′ ) : otherwise

As we mentioned in the chapter about insertion sort, the functional approach of binary
search is through binary search tree. That the ordered sequence is represented in a tree
( self balanced tree if necessary), which offers logarithm time searching 3 .
Although it doesn’t make sense to apply divide and conquer binary search on linked-
list, binary search can still be very useful in purely functional settings. Consider solving
an equation ax = y, for given natural numbers a and y, where a ≤ y. We want to find
the integer solution for x if there is. Of course brute-force naive searching can solve it.
We can examine all numbers one by one from 0 for a0 , a1 , a2 , ..., stops if ai = y or report
that there is no solution if ai < y < ai+1 for some i. We initialize the solution domain as
X = {0, 1, 2, ...}, and call the below exhausted searching function solve(a, y, X).

 x1 : ax1 = y
solve(a, y, X) = solve(a, y, X ′ ) : ax1 < y

Err : otherwise
3 Some readers may argue that array should be used instead of linked-list, for example in Haskell. This

book only deals with purely functional sequences in finger-tree. Different from the Haskell array, it can’t
support constant time random accessing
374 CHAPTER 14. SEARCHING

This function examines the solution domain in monotonic increasing order. It takes
the first candidate element x1 from X, compare ax1 and y, if they are equal, then x1 is the
solution and we are done; if it is less than y, then x1 is dropped, and we search among the
rest elements represented as X ′ ; Otherwise, since f (x) = ax is non-decreasing function
when a is natural number, so the rest elements will only make f (x) bigger and bigger.
There is no integer solution for this equation. The function returns error to indicate no
solution.
The computation of ax is expensive for big a and x if precession must be kept4 . Can it
be improved so that we can compute as less as possible? The divide and conquer binary
search can help. Actually, we can estimate the upper limit of the solution domain. As
ay ≤ y, We can search in range {0, 1, ..., y}. As the function f (x) = ax is non-decreasing
against its argument x, we can firstly check the middle point candidate xm = b 0+y 2 c, if
axm = y, the solution is found; if it is less than y, we can drop all candidate solutions before
xm ; otherwise we drop all candidate solutions after it; Both halve the solution domain.
We repeat this approach until either the solution is found or the solution domain becomes
empty, which indicates there is no integer solution.
The binary search method can be formalized as the following equation. The non-
decreasing function is abstracted as a parameter. To solve our problem, we can just call
it as bsearch(f, y, 0, y), where f (x) = ax .


 Err : u < l

m : f (m) = y, m = b l+u 2 c
bsearch(f, y, l, u) = (14.7)

 bsearch(f, y, l, m − 1) : f (m) > y

bsearch(f, y, m + 1, u) : f (m) < y

As we halve the solution domain in every recursion, this method computes f (x) in
O(log y) times. It is much faster than the brute-force searching.

2 dimensions search
It’s quite natural to think that the idea of binary search can be extended to 2 dimensions
or even more general – multiple-dimensions domain. However, it is not so easy.
Consider the example of a m × n matrix M . The elements in each row and each
column are in strict increasing order. Figure 14.1 illustrates such a matrix for example.
 
1 2 3 4 ...
 2 4 5 6 ... 
 
 3 5 7 8 ... 
 
 4 6 8 9 ... 
...

Figure 14.1: A matrix in strict increasing order for each row and column.

Given a value x, how to locate all elements equal to x in the matrix quickly? We need
develop an algorithm, which returns a list of locations (i, j) so that Mi,j = x.
Richard Bird in [1] mentioned that he used this problem to interview candidates
for entry to Oxford. The interesting story was that, those who had some computer
background at school tended to use binary search. But it’s easy to get stuck.
The usual way follows binary search idea is to examine element at M m2 , n2 . If it is less
than x, we can only drop the elements in the top-left area; If it is greater than x, only
4 One alternative is to reuse the result of an when compute an+1 = aan . Here we consider for general

form monotonic function f (n)

14.2. SEQUENCE SEARCH 375

the bottom-right area can be dropped. Both cases are illustrated in figure 14.2, the gray
areas indicate elements can be dropped.

Figure 14.2: Left: the middle point element is smaller than x. All elements in the gray
area are less than x; Right: the middle point element is greater than x. All elements in
the gray area are greater than x.

The problem is that the solution domain changes from a rectangle to a ’L’ shape in
both cases. We can’t just recursively apply search on it. In order to solve this problem
systematically, we define the problem more generally, using brute-force search as a start
point, and keep improving it bit by bit.
Consider a function f (x, y), which is strict increasing for its arguments, for instance
f (x, y) = ax + by , where a and b are natural numbers. Given a value z, which is a natural
number too, we want to solve the equation f (x, y) = z by finding all none negative integral
candidate pairs (x, y).
With this definition, the matrix search problem can be specialized by below function.
{
Mx,y : 1 ≤ x ≤ m, 1 ≤ y ≤ n
f (x, y) =
−1 : otherwise

Brute-force 2D search
As all solutions should be found for f (x, y). One can immediately give the brute force
solution by embedded looping.
1: function Solve(f, z)
2: A←ϕ
3: for x ∈ {0, 1, 2, ..., z} do
4: for y ∈ {0, 1, 2, ..., z} do
5: if f (x, y) = z then
6: A ← A ∪ {(x, y)}
7: return A
This definitely calculates f for (z + 1)2 times. It can be formalized as in (14.8).

solve(f, z) = {(x, y)|x ∈ {0, 1, ..., z}, y ∈ {0, 1, ..., z}, f (x, y) = z} (14.8)

Saddleback search
We haven’t utilize the fact that f (x, y) is strict increasing yet. Dijkstra pointed out in
[82], instead of searching from bottom-left corner, starting from the top-left leads to one
376 CHAPTER 14. SEARCHING

effective solution. As illustrated in figure 14.3, the search starts from (0, z), for every
point (p, q), we compare f (p, q) with z:

• If f (p, q) < z, since f is strict increasing, for all 0 ≤ y < q, we have f (p, y) < z. We
can drop all points in the vertical line section (in red color);

• If f (p, q) > z, then f (x, q) > z for all p < x ≤ z. We can drop all points in the
horizontal line section (in blue color);

• Otherwise if f (p, q) = z, we mark (p, q) as one solution, then both line sections can
be dropped.

This is a systematical way to scale down the solution domain rectangle. We keep
dropping a row, or a column, or both.

Figure 14.3: Search from top-left.

This method can be formalized as a function search(f, z, p, q), which searches so-
lutions for equation f (x, y) = z in rectangle with top-left corner (p, q), and bottom-
right corner (z, 0). We start the searching by initializing (p, q) = (0, z) as solve(f, z) =
search(f, z, 0, z)


 ϕ : p>z∨q <0

search(f, z, p + 1, q) : f (p, q) < z
search(f, z, p, q) = (14.9)

 search(f, z, p, q − 1) : f (p, q) > z

{(p, q)} ∪ search(f, z, p + 1, q − 1) : otherwise

The first clause is the edge case, there is no solution if (p, q) isn’t top-left to (z, 0).
The following example Haskell program implements this algorithm.
solve f z = search 0 z where
search p q | p > z | | q < 0 = []
| z' < z = search (p + 1) q
| z' > z = search p (q - 1)
| otherwise = (p, q) : search (p + 1) (q - 1)
where z' = f p q

Considering the calculation of f may be expensive, this program stores the result of
f (p, q) to variable z ′ . This algorithm can also be implemented in iterative manner, that
the boundaries of solution domain keeps being updated in a loop.
1: function Solve(f, z)
2: p ← 0, q ← z
14.2. SEQUENCE SEARCH 377

3: S←ϕ
4: while p ≤ z ∧ q ≥ 0 do
5: z ′ ← f (p, q)
6: if z ′ < z then
7: p←p+1
8: else if z ′ > z then
9: q ←q−1
10: else
11: S ← S ∪ {(p, q)}
12: p ← p + 1, q ← q − 1
13: return S
It’s intuitive to translate this imperative algorithm to real program, as the following
example Python code.
def solve(f, z):
(p, q) = (0, z)
res = []
while p ≤ z and q ≥ 0:
z1 = f(p, q)
if z1 < z:
p = p + 1
elif z1 > z:
q = q - 1
else:
[Link]((p, q))
(p, q) = (p + 1, q - 1)
return res

It is clear that in every iteration, At least one of p and q advances to the bottom-right
corner by one. Thus it takes at most 2(z + 1) steps to complete searching. This is the
worst case. There are three best cases. The first one happens that in every iteration,
both p and q advance by one, so that it needs only z + 1 steps; The second case keeps
advancing horizontally to right and ends when p exceeds z; The last case is similar, that
it keeps moving down vertically to the bottom until q becomes negative.
Figure 14.4 illustrates the best cases and the worst cases respectively. Figure 14.4 (a)
is the case that every point (x, z − x) in diagonal satisfies f (x, z − x) = z, it uses z + 1
steps to arrive at (z, 0); (b) is the case that every point (x, z) along the top horizontal
line gives the result f (x, z) < z, the algorithm takes z + 1 steps to finish; (c) is the case
that every point (0, x) along the left vertical line gives the result f (0, x) > z, thus the
algorithm takes z + 1 steps to finish; (d) is the worst case. If we project all the horizontal
sections along the search path to x axis, and all the vertical sections to y axis, it gives
the total steps of 2(z + 1).
Compare to the quadratic brute-force method (O(z 2 )), we improve to a linear algo-
rithm bound to O(z).
Bird imagined that the name ‘saddleback’ is because the 3D plot of f with the smallest
bottom-left and the largest top-right and two wings looks like a saddle as shown in figure
14.5

Improved saddleback search

We haven’t utilized the binary search tool so far, even the problem extends to 2-dimension
domain. The basic saddleback search starts from the top-left corner (0, z) to the bottom-
right corner (z, 0). This is actually over-general domain. we can constraint it a bit more
accurate.
Since f is strict increasing, we can find the biggest number m, that 0 ≤ m ≤ z,
along the y axis which satisfies f (0, m) ≤ z; Similarly, we can find the biggest n, that
378 CHAPTER 14. SEARCHING

Figure 14.4: The best cases and the worst cases.

Figure 14.5: Plot of f (x, y) = x2 + y 2 .

14.2. SEQUENCE SEARCH 379

0 ≤ n ≤ z, along the x axis, which satisfies f (n, 0) ≤ z; And the solution domain shrinks
from (0, z) − (z, 0) to (0, m) − (n, 0) as shown in figure 14.6.

Figure 14.6: A more accurate search domain shown in gray color.

Of course m and n can be found by brute-force like below.

m = max({y|0 ≤ y ≤ z, f (0, y) ≤ z})

(14.10)
n = max({x|0 ≤ x ≤ z, f (x, 0) ≤ z})

When searching m, the x variable of f is bound to 0. It turns to be one dimension

search problem for a strict increasing function (or in functional term, a Curried function
f (0, y)). Binary search works in such case. However, we need a bit modification for
equation (14.7). Different from searching a solution l ≤ x ≤ u, so that f (x) = y for a
given y; we need search for a solution l ≤ x ≤ u so that f (x) ≤ y < f (x + 1).


 l :u≤l

m :f (m) ≤ y < f (m + 1), m = b l+u
2 c
bsearch(f, y, l, u) =

 bsearch(f, y, m + 1, u) :f (m) ≤ y

bsearch(f, y, l, m − 1) :otherwise
(14.11)
The first clause handles the edge case of empty range. The lower boundary is returned
in such case; If the middle point produces a value less than or equal to the target, while
the next one evaluates to a bigger value, then the middle point is what we are looking for;
Otherwise if the point next to the middle also evaluates to a value not greater than the
target, the lower bound is set as the middle point plus one, and we perform recursively
binary search; In the last case, the middle point evaluates to a value greater than the
target, upper bound is updated as the point proceeds to the middle for further recursive
searching. The following Haskell example code implements this modified binary search.
bsearch f y (l, u) | u ≤ l = l
| f m ≤ y = if f (m + 1) ≤ y
then bsearch f y (m + 1, u) else m
| otherwise = bsearch f y (l, m-1)
where m = (l + u) `div` 2

Then m and n can be found with this binary search function.

m = bsearch(λy · f (0, y), z, 0, z)

(14.12)
n = bsearch(λx · f (x, 0), z, 0, z)
380 CHAPTER 14. SEARCHING

And the improved saddleback search shrinks to this new search domain solve(f, z) =
search(f, z, 0, m):


 ϕ : p>n∨q <0

search(f, z, p + 1, q) : f (p, q) < z
search(f, z, p, q) = (14.13)

 search(f, z, p, q − 1) : f (p, q) > z

{(p, q)} ∪ search(f, z, p + 1, q − 1) : otherwise

It’s almost as same as the basic saddleback version, except that it stops if p exceeds
n, but not z. In real implementation, the result of f (p, q) can be calculated once, and
stored in a variable as shown in the following Haskell example.
solve' f z = search 0 m where
search p q | p > n | | q < 0 = []
| z' < z = search (p + 1) q
| z' > z = search p (q - 1)
| otherwise = (p, q) : search (p + 1) (q - 1)
where z' = f p q
m = bsearch (f 0) z (0, z)
n = bsearch (λx→f x 0) z (0, z)

This improved saddleback search firstly performs binary search two rounds to find the
proper m, and n. Each round is bound to O(lg z) times of calculation for f ; After that,
it takes O(m + n) time in the worst case; and O(min(m, n)) time in the best case. The
overall performance is given in the following table.
times of evaluation f
worst case 2 log z + m + n
best case 2 log z + min(m, n)
For some function f (x, y) = ax + by , for positive integers a and b, m and n will be
relative small, that the performance is close to O(lg z).
This algorithm can also be realized in imperative approach. Firstly, the binary search
should be modified.
1: function Binary-Search(f, y, (l, u))
2: while l < u do
3: m ← b l+u 2 c
4: if f (m) ≤ y then
5: if y < f (m + 1) then
6: return m
7: l ←m+1
8: else
9: u←m
10: return l
Utilize this algorithm, the boundaries m and n can be found before performing the
saddleback search.
1: function Solve(f, z)
2: m ← Binary-Search(λy · f (0, y), z, (0, z))
3: n ← Binary-Search(λx · f (x, 0), z, (0, z))
4: p ← 0, q ← m
5: S←ϕ
6: while p ≤ n ∧ q ≥ 0 do
7: z ′ ← f (p, q)
8: if z ′ < z then
9: p←p+1
10: else if z ′ > z then
14.2. SEQUENCE SEARCH 381

11: q ←q−1
12: else
13: S ← S ∪ {(p, q)}
14: p ← p + 1, q ← q − 1
15: return S
The implementation is left as exercise to the reader.

More improvement to saddleback search

In figure 14.2, two cases are shown for comparing the value of the middle point in a matrix
with the given value. One case is the center value is smaller than the given value, the
other is bigger. In both cases, we can only throw away 14 candidates, and left a ’L’ shape
for further searching.
Actually, one important case is missing. We can extend the observation to any point
inside the rectangle searching area. As shown in the figure 14.7.

(a) If f (p, q) ̸= z, only lower-left or upper-right sub

area (in gray color) can be thrown. Both left a ’L’
shape.

(b) If f (p, q) = z, both sub areas can be thrown, the

scale of the problem is halved.

Figure 14.7: The eﬀiciency of scaling down the search domain.

Suppose we are searching in a rectangle from the upper-left corner (a, b) to the lower-
right corner (c, d). If the (p, q) isn’t the middle point, and f (p, q) 6= z. We can’t ensure
the area to be dropped is always 1/4. However, if f (p, q) = z, as f is strict increasing, we
382 CHAPTER 14. SEARCHING

are not only sure both the lower-left and the upper-right sub areas can be thrown, but
also all the other points in the column p and row q. The problem can be scaled down
fast, because only 1/2 area is left.
This indicates us, instead of jumping to the middle point to start searching. A more
eﬀicient way is to find a point which evaluates to the target value. One straightforward
way to find such a point, is to perform binary search along the center horizontal line or
the center vertical line of the rectangle.
The performance of binary search along a line is logarithmic to the length of that line.
A good idea is to always pick the shorter center line as shown in figure 14.8. That if
the height of the rectangle is longer than the width, we perform binary search along the
horizontal center line; otherwise we choose the vertical center line.

Figure 14.8: Binary search along the shorter center line.

However, what if we can’t find a point (p, q) in the center line, that satisfies f (p, q) = z?
Let’s take the center horizontal line for example. even in such case, we can still find a
point that f (p, q) < z < f (p + 1, q). The only difference is that we can’t drop the points
in column p and row q completely.
Combine this conditions, the binary search along the horizontally line is to find a p,
satisfies f (p, q) ≤ z < f (p + 1, q); While the vertical line search condition is f (p, q) ≤ z <
f (p, q + 1).
The modified binary search ensures that, if all points in the line segment give f (p, q) <
z, the upper bound will be found; and the lower bound will be found if they all greater
than z. We can drop the whole area on one side of the center line in such case.
Sum up all the ideas, we can develop the eﬀicient improved saddleback search as the
following.

1. Perform binary search along the y axis and x axis to find the tight boundaries from
(0, m) to (n, 0);

2. Denote the candidate rectangle as (a, b) − (c, d), if the candidate rectangle is empty,
the solution is empty;

3. If the height of the rectangle is longer than the width, perform binary search along
the center horizontal line; otherwise, perform binary search along the center vertical
line; denote the search result as (p, q);

4. If f (p, q) = z, record (p, q) as a solution, and recursively search two sub rectangles
(a, b) − (p − 1, q + 1) and (p + 1, q − 1) − (c, d);
14.2. SEQUENCE SEARCH 383

5. Otherwise, f (p, q) 6= z, recursively search the same two sub rectangles plus a line
section. The line section is either (p, q + 1) − (p, b) as shown in figure 14.9 (a); or
(p + 1, q) − (c, q) as shown in figure 14.9 (b).

Figure 14.9: Recursively search the gray areas, the bold line should be included if f (p, q) 6=
z.

This algorithm can be formalized as the following. The equation (14.11), and (14.12)
are as same as before. A new search function should be defined.
Define Search(a,b),(c,d) as a function for searching rectangle with top-left corner (a, b),
and bottom-right corner (c, d).

 ϕ : c<a∨d<b
search(a,b),(c,d) = csearch : c−a<b−d (14.14)

rsearch : otherwise

Function csearch performs binary search in the center horizontal line to find a point
(p, q) that f (p, q) ≤ z < f (p + 1, q). This is shown in figure 14.9 (a). There is a special
edge case, that all points in the lines evaluate to values greater than z. The general binary
search will return the lower bound as result, so that (p, q) = (a, b b+d
2 c). The whole upper
side includes the center line can be dropped as shown in figure 14.10 (a).

Figure 14.10: Edge cases when performing binary search in the center line.
384 CHAPTER 14. SEARCHING

 search(p,q−1),(c,d) : z < f (p, q)
csearch = search(a,b),(p−1,q+1) ∪ {(p, q)} ∪ search(p+1,q−1),(c,d) : f (p, q) = z

search(a,b),(p,q+1) ∪ search(p+1,q−1),(c,d) : otherwise
(14.15)
Where

2 c)
q = b b+d
p = bsearch(λx · f (x, q), z, (a, c))
Function rsearch is quite similar except that it searches in the center horizontal line.

 search(a,b),(p−1,q) : z < f (p, q)
rsearch = search(a,b),(p−1,q+1) ∪ {(p, q)} ∪ search(p+1,q−1),(c,d) : f (p, q) = z

search(a,b),(p−1,q+1) ∪ search(p+1,q),(c,d) : otherwise
(14.16)
Where
2 c)
p = b a+c
q = bsearch(λy · f (p, y), z, (d, b))
The following Haskell program implements this algorithm.
search f z (a, b) (c, d) | c < a | | b < d = []
| c - a < b - d = let q = (b + d) `div` 2 in
csearch (bsearch (λx → f x q) z (a, c), q)
| otherwise = let p = (a + c) `div` 2 in
rsearch (p, bsearch (f p) z (d, b))
where
csearch (p, q) | z < f p q = search f z (p, q - 1) (c, d)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p, q + 1) ++
search f z (p + 1, q - 1) (c, d)
rsearch (p, q) | z < f p q = search f z (a, b) (p - 1, q)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p - 1, q + 1) ++
search f z (p + 1, q) (c, d)

And the main program calls this function after performing binary search in X and Y
axes.
solve f z = search f z (0, m) (n, 0) where
m = bsearch (f 0) z (0, z)
n = bsearch (λx → f x 0) z (0, z)

Since we drop half areas in every recursion, it takes O(log(mn)) rounds of search.
However, in order to locate the point (p, q), which halves the problem, we must perform
binary search along the center line. which will call f about O(log(min(m, n))) times.
Denote the time of searching a m × n rectangle as T (m, n), the recursion relationship can
be represented as the following.
m n
T (m, n) = log(min(m, n)) + 2T ( , ) (14.17)
2 2
Suppose m > n, using telescope method, for m = 2i , and n = 2j . We have:
T (2i , 2j ) = j + 2T (2i−1 , 2j−1 )
∑
i−1
= 2k (j − k)
(14.18)
k=0
= O(2i (j − i))
= O(m log(n/m))
14.2. SEQUENCE SEARCH 385

Richard Bird proved that this is asymptotically optimal by a lower bound of searching
a given value in m × n rectangle [1].
The imperative algorithm is almost as same as the functional version. We skip it for
the sake of brevity.

Exercise 14.1

• Prove that the average case for the divide and conquer solution to k-selection prob-
lem is O(n). Please refer to previous chapter about quick sort.

• Implement the imperative k-selection problem with 2-way partition, and median-
of-three pivot selection.

• Implement the imperative k-selection problem to handle duplicated elements effec-

tively.

• Realize the median-of-median k-selection algorithm and implement it in your fa-

vorite programming language.

• The tops(k, L) algorithm uses list concatenation likes A ∪ {l1 } ∪ tops(k − |A| − 1, B).
It is linear operation which is proportion to the length of the list to be concatenated.
Modify the algorithm so that the sub lists are concatenated by one pass.

• The author considered another divide and conquer solution for the k-selection prob-
lem. It finds the maximum of the first k elements and the minimum of the rest.
Denote them as x, and y. If x is smaller than y, it means that all the first k elements
are smaller than the rest, so that they are exactly the top k smallest; Otherwise,
There are some elements in the first k should be swapped.
1: procedure Tops(k, A)
2: l←1
3: u ← |A|
4: loop
5: i ← Max-At(A[l..k])
6: j ← Min-At(A[k + 1..u])
7: if A[i] < A[j] then
8: break
9: Exchange A[l] ↔ A[j]
10: Exchange A[k + 1] ↔ A[i]
11: l ← Partition(A, l, k)
12: u ← Partition(A, k + 1, u)

Explain why this algorithm works? What’s the performance of it?

• Implement the binary search algorithm in both recursive and iterative manner, and
try to verify your version automatically. You can either generate randomized data,
test your program with the binary search invariant, or compare with the built-in
binary search tool in your standard library.

• Find the solution to calculate the median of two sorted arrays A and B. The time
should be bound to O(lg(|A| + |B|)).

• Implement the improved saddleback search by firstly performing binary search to

find a more accurate solution domain in your favorite imperative programming
language.
386 CHAPTER 14. SEARCHING

• Realize the improved 2D search, by performing binary search along the shorter
center line, in your favorite imperative programming language.

• Someone considers that the 2D search can be designed as the following. When
search a rectangle, as the minimum value is at bottom-left, and the maximum at
to-right. If the target value is less than the minimum or greater than the maximum,
then there is no solution; otherwise, the rectangle is divided into 4 sub rectangles
at the center point, then perform recursively searching.
1: procedure Search(f, z, a, b, c, d) ▷ (a, b): bottom-left (c, d): top-right
2: if z ≤ f (a, b) ∨ f (c, d) ≥ z then
3: if z = f (a, b) then
4: record (a, b) as a solution
5: if z = f (c, d) then
6: record (c, d) as a solution
7: return
8: p ← b a+c
2 c
9: q←b 2 cb+d

10: Search(f, z, a, q, p, d)
11: Search(f, z, p, q, c, d)
12: Search(f, z, a, b, p, q)
13: Search(f, z, p, b, c, q)

What’s the performance of this algorithm?

14.2.2 Information reuse

One interesting behavior is that people learning while searching. We do not only remember
lessons which cause search fails, but also learn patterns which lead to success. This is a
kind of information reusing, no matter the information is positive or negative. However,
It’s not easy to determine what information should be kept. Too little information isn’t
enough to help effective searching, while keeping too much is expensive in term of spaces.
In this section, we’ll first introduce two interesting problems, Boyer-Moore majority
number problem and the maximum sum of sub vector problem. Both reuse information as
little as possible. After that, two popular string matching algorithms, Knuth-Morris-Pratt
algorithm and Boyer-Moore algorithm will be introduced.

Boyer-Moore majority number

Voting is quite critical to people. We use voting to choose the leader, make decision or
reject a proposal. In the months when I was writing this chapter, there are three countries
in the world voted their presidents. All of the three voting activities utilized computer to
calculate the result.
Suppose there is a country in a small island wants a new president. According to the
constitution, only if the candidate wins more than half of the votes can be selected as
the president. Given a serious of votes, such as A, B, A, C, B, B, D, ..., can we develop
a program tells who is the new president if there is, or indicate nobody wins more than
half of the votes?
Of course this problem can be solved with brute-force by using a map. As what we
did in the chapter of binary search tree5 .
5 There is a probabilistic sub-linear space counting algorithm published in 2004, named as ‘Count-min

sketch’[84].
14.2. SEQUENCE SEARCH 387

template<typename T>
T majority(const T∗ xs, int n, T fail) {
map<T, int> m;
int i, max = 0;
T r;
for (i = 0; i < n; ++i)
++m[xs[i]];
for (typename map<T, int>::iterator it = [Link](); it ̸= [Link](); ++it)
if (it→second > max) {
max = it→second;
r = it→first;
}
return max ∗ 2 > n ? r : fail;
}

This program first scan the votes, and accumulates the number of votes for each
individual with a map. After that, it traverse the map to find the one with the most of
votes. If the number is bigger than the half, the winner is found otherwise, it returns a
special value to indicate fail.
The following pseudo code describes this algorithm.
1: function Majority(A)
2: M ← empty map
3: for ∀a ∈ A do
4: Put(M , a, 1+ Get(M, a))
5: max ← 0, m ← N IL
6: for ∀(k, v) ∈ M do
7: if max < v then
8: max ← v, m ← k
9: if max > |A|50% then
10: return m
11: else
12: fail
For m individuals and n votes, this program firstly takes about O(n log m) time to
build the map if the map is implemented in self balanced tree (red-black tree for instance);
or about O(n) time if the map is hash table based. However, the hash table needs more
space. Next the program takes O(m) time to traverse the map, and find the majority
vote. The following table lists the time and space performance for different maps.
map time space
self-balanced tree O(n log m) O(m)
hashing O(n) O(m) at least
Boyer and Moore invented a cleaver algorithm in 1980, which can pick the majority
element with only one scan if there is. Their algorithm only needs O(1) space [83].
The idea is to record the first candidate as the winner so far, and mark him with 1
vote. During the scan process, if the winner being selected gets another vote, we just
increase the vote counter; otherwise, it means somebody vote against this candidate, so
the vote counter should be decreased by one. If the vote counter becomes zero, it means
this candidate is voted out; We select the next candidate as the new winner and repeat
the above scanning process.
Suppose there is a series of votes: A, B, C, B, B, C, A, B, A, B, B, D, B. Below table
illustrates the steps of this processing.
388 CHAPTER 14. SEARCHING

winner count scan position

A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
C 1 A, B, C, B, B, C, A, B, A, B, B, D, B
C 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B
B 0 A, B, C, B, B, C, A, B, A, B, B, D, B
A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B
B 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B
The key point is that, if there exits the majority greater than 50%, it can’t be voted
out by all the others. However, if there are not any candidates win more than half of the
votes, the recorded ‘winner’ is invalid. Thus it is necessary to perform a second round
scan for verification.
The following pseudo code illustrates this algorithm.
1: function Majority(A)
2: c←0
3: for i ← 1 to |A| do
4: if c = 0 then
5: x ← A[i]
6: if A[i] = x then
7: c←c+1
8: else
9: c←c−1
10: return x
If there is the majority element, this algorithm takes one pass to scan the votes. In
every iteration, it either increases or decreases the counter according to the vote is support
or against the current selection. If the counter becomes zero, it means the current selection
is voted out. So the new one is selected as the updated candidate for further scan.
The process is linear O(n) time, and the spaces needed are just two variables. One
for recording the selected candidate so far, the other is for vote counting.
Although this algorithm can find the majority element if there is. it still picks an
element even there isn’t. The following modified algorithm verifies the final result with
another round of scan.
1: function Majority(A)
2: c←0
3: for i ← 1 to |A| do
4: if c = 0 then
5: x ← A[i]
6: if A[i] = x then
7: c←c+1
8: else
9: c←c−1
10: c←0
11: for i ← 1 to |A| do
12: if A[i] = x then
13: c←c+1
14.2. SEQUENCE SEARCH 389

14: if c > %50|A| then

15: return x
16: else
17: fail
Even with this verification process, the algorithm is still bound to O(n) time, and the
space needed is constant. The following ISO C++ program implements this algorithm 6 .
template<typename T>
T majority(const T∗ xs, int n, T fail) {
T m;
int i, c;
for (i = 0, c = 0; i < n; ++i) {
if (!c)
m = xs[i];
c += xs[i] == m ? 1 : -1;
}
for (i = 0, c = 0; i < n; ++i, c += xs[i] == m);
return c ∗ 2 > n ? m : fail;
}

Boyer-Moore majority algorithm can also be realized in purely functional approach.

Different from the imperative settings, which use variables to record and update informa-
tion, accumulators are used to define the core algorithm. Define function maj(c, n, L),
which takes a list of votes L, a selected candidate c so far, and a counter n. For non empty
list L, we initialize c as the first vote l1 , and set the counter as 1 to start the algorithm:
maj(l1 , 1, L′ ), where L′ is the rest votes except for l1 . Below are the definition of this
function.


 c : L=ϕ

maj(c, n + 1, L′ ) : l1 = c
maj(c, n, L) = (14.19)

 maj(l1 , 1, L′ ) : n = 0 ∧ l1 6= c
 ′
maj(c, n − 1, L ) : otherwise

We also need to define a function, which can verify the result. The idea is that, if the
list of votes is empty, the final result is a failure; otherwise, we start the Boyer-Moore
algorithm to find a candidate c, then we scan the list again to count the total votes c
wins, and verify if this number is not less than the half.

 f ail : L = ϕ
majority(L) = c : c = maj(l1 , 1, L′ ), |{x|x ∈ L, x = c}| > %50|L| (14.20)

f ail : otherwise

Below Haskell example code implements this algorithm.

majority :: (Eq a) ⇒ [a] → Maybe a
majority [] = Nothing
majority (x:xs) = let m = maj x 1 xs in verify m (x:xs)

maj c n [] = c
maj c n (x:xs) | c == x = maj c (n+1) xs
| n == 0 = maj x 1 xs
| otherwise = maj c (n-1) xs

verify m xs = if 2 ∗ (length $ filter (==m) xs) > length xs

then Just m else Nothing

6 We actually uses the ANSI C style. The C++ template is only used to generalize the type of the

element
390 CHAPTER 14. SEARCHING

Maximum sum of sub vector

Jon Bentley presents another interesting puzzle which can be solved by using quite similar
idea in [4]. The problem is to find the maximum sum of sub vector. For example in the
following array, The sub vector {19, -12, 1, 9, 18} yields the biggest sum 35.
3 -13 19 -12 1 9 18 -16 15 -15
Note that it is only required to output the value of the maximum sum. If all the
numbers are positive, the answer is definitely the sum of all. Another special case is that
all numbers are negative. We define the maximum sum is 0 for an empty sub vector.
Of course we can find the answer with brute-force, by calculating all sums of sub
vectors and picking the maximum. Such naive method is typical quadratic.
1: function Max-Sum(A)
2: m←0
3: for i ← 1 to |A| do
4: s←0
5: for j ← i to |A| do
6: s ← s + A[j]
7: m ← Max(m, s)
8: return m
The brute force algorithm does not reuse any information in previous search. Similar
with Boyer-Moore majority vote algorithm, we can record the maximum sum end to the
position where we are scanning. Of course we also need record the biggest sum found so
far. The following figure illustrates this idea and the invariant during scan.

... max ... max end at i ...

Figure 14.11: Invariant during scan.

At any time when we scan to the i-th position, the max sum found so far is recorded
as A. At the same time, we also record the biggest sum end at i as B. Note that A
and B may not be the same, in fact, we always maintain B ≤ A. and when B becomes
greater than A by adding with the next element, we update A with this new value.
When B becomes negative, this happens when the next element is a negative number, we
reset it to 0. The following tables illustrated the steps when we scan the example vector
{3, −13, 19, −12, 1, 9, 18, −16, 15, −15}.
max sum max end at i list to be scan
0 0 {3, −13, 19, −12, 1, 9, 18, −16, 15, −15}
3 3 {−13, 19, −12, 1, 9, 18, −16, 15, −15}
3 0 {19, −12, 1, 9, 18, −16, 15, −15}
19 19 {−12, 1, 9, 18, −16, 15, −15}
19 7 {1, 9, 18, −16, 15, −15}
19 8 {9, 18, −16, 15, −15}
19 17 {18, −16, 15, −15}
35 35 {−16, 15, −15}
35 19 {15, −15}
35 34 {−15}
35 19 {}
This algorithm can be described as below.
14.2. SEQUENCE SEARCH 391

1: function Max-Sum(V )
2: A ← 0, B ← 0
3: for i ← 1 to |V | do
4: B ← Max(B + V [i], 0)
5: A ← Max(A, B)
It is trivial to implement this linear time algorithm, that we skip the details here.
This algorithm can also be defined in functional approach. Instead of mutating vari-
ables, we use accumulator to record A and B. In order to search the maximum sum of
list L, we call the below function with maxsum (0, 0, L).
{
A : L=ϕ
maxsum (A, B, L) = (14.21)
maxsum (A′ , B ′ , L′ ) : otherwise

Where
B ′ = max(l1 + B, 0)
A′ = max(A, B ′ )
Below Haskell example code implements this algorithm.
maxsum = msum 0 0 where
msum a _ [] = a
msum a b (x:xs) = let b' = max (x+b) 0
a' = max a b'
in msum a' b' xs

KMP
String matching is another important type of searching. Almost all the software editors
are equipped with tools to find string in the text. In chapters about Trie, Patricia, and
suﬀix tree, we have introduced some powerful data structures which can help to search
string. In this section, we introduce another two string matching algorithms all based on
information reusing.
Some programming environments provide built-in string search tools, however, most
of them are brute-force solution including ‘strstr’ function in ANSI C standard library,
‘find’ in C++ standard template library, ‘indexOf’ in Java Development Kit etc. Figure
14.12 illustrate how such character-by-character comparison process works.
Suppose we search a pattern P in text T , as shown in figure 14.12 (a), at offset s = 4,
the process examines every character in P and T to check if they are same. It successfully
matches the first 4 characters ‘anan’. However, the 5th character in the pattern string is
‘y’. It doesn’t match the corresponding character in the text, which is ‘t’.
At this stage, the brute-force solution terminates the attempt, increases s by one to 5,
and restart the comparison between ‘ananym’ and ‘nantho...’. Actually, we can increase
s not only by one. This is because we have already known that the first four characters
‘anan’ have been matched, and the failure happens at the 5th position. Observe the two
letters prefix ‘an’ of the pattern string is also a suﬀix of ‘anan’ that we have matched so
far. A more effective way is to shift s by two but not one, which is shown in figure 14.12
(b). By this means, we reused the information that 4 characters have been matched. This
helps us to skip invalid positions as many as possible.
Knuth, Morris and Pratt presented this idea in [85] and developed a novel string
matching algorithm. This algorithm is later called as ‘KMP’, which is consist of the three
authors’ initials.
For the sake of brevity, we denote the first k characters of text T as Tk . Which means
Tk is the k-character prefix of T .
392 CHAPTER 14. SEARCHING

a n y a n a n t h o u s a n a n y m f l o w e r T

s a n a n y m P
q

(a) The offset s = 4, after matching q = 4 characters, the 5th mismatch.

a n y a n a n t h o u s a n a n y m f l o w e r T

s a n a n y m P
q

(b) Move s = 4 + 2 = 6, directly.

Figure 14.12: Match ‘ananym’ in ‘any ananthous ananym flower’.

The key point to shift s effectively is to find a function of q, where q is the number
of characters matched successfully. For instance, q is 4 in figure 14.12 (a), as the 5th
character doesn’t match.
Consider what situation we can shift s more than 1. As shown in figure 14.13, if we
can shift the pattern P ahead, there must exist k, so that the first k characters are as
same as the last k characters of Pq . In other words, the prefix Pk is suﬀix of Pq .

... T[i] T[i+1] T[i+2] ... ... ... ... T[i+q-1] ... T

s
P[1] P[2] ... P[j] P[j+1] ... P[q] ... P

P[1] P[2] ... P[k] ... P

Figure 14.13: Pk is both prefix of Pq and suﬀix of Pq .

It’s possible that there is no such a prefix that is the suﬀix at the same time. If we
treat empty string as both the prefix and the suﬀix of any others, there must be at least
one solution that k = 0. It’s also quite possible that there are multiple k satisfy. To avoid
missing any possible matching positions, we have to find the biggest k. We can define a
prefix function π(q) which tells us where we can fallback if the (q + 1)-th character does
not match [4].

π(q) = max{k|k < q ∧ Pk ⊐ Pq } (14.22)

Where ⊐ is read as ‘is suﬀix of’. For instance, A ⊐ B means A is suﬀix of B. This
function is used as the following. When we match pattern P against text T from offset
s, If it fails after matching q characters, we next look up π(q) to get a fallback q ′ , and
retry to compare P [q ′ ] with the previous unmatched character. Based on this idea, the
core algorithm of KMP can be described as the following.
1: function KMP(T, P )
14.2. SEQUENCE SEARCH 393

2: n ← |T |, m ← |P |
3: build prefix function π from P
4: q←0 ▷ How many characters have been matched so far.
5: for i ← 1 to n do
6: while q > 0 ∧ P [q + 1] 6= T [i] do
7: q ← π(q)
8: if P [q + 1] = T [i] then
9: q ←q+1
10: if q = m then
11: found one solution at i − m
12: q ← π(q) ▷ look for next solution

Although the definition of prefix function π(q) is given in equation (14.22), realizing
it blindly by finding the longest suffix isn’t effective. Actually we can use the idea of
information reusing again to build the prefix function.
The trivial edge case is that, the first character doesn’t match. In this case the longest
prefix, which is also the suffix is definitely empty, so π(1) = k = 0. We record the longest
prefix as Pk . In this edge case Pk = P0 is the empty string.
After that, when we scan at the q-th character in the pattern string P , we hold the
invariant that the prefix function values π(i) for i in {1, 2, ..., q − 1} have already been
recorded, and Pk is the longest prefix which is also the suffix of Pq−1 . As shown in figure
14.14, if P [q] = P [k + 1], A bigger k than before is found, we can increase the maximum
of k by one; otherwise, if they are not same, we can use π(k) to fallback to a shorter prefix
Pk′ where k ′ = π(k), and check if the next character after this new prefix is same as the
q-th character. We need repeat this step until either k becomes zero (which means only
empty string satisfies), or the q-th character matches.

P[1] P[2] ... P[k] P[k+1] ... P[q-1] P[q] ...

P[1] P[2] ... P[k] P[k+1] ...

Figure 14.14: Pk is suﬀix of Pq−1 , P [q] and P [k + 1] are compared.

Realizing this idea gives the KMP prefix building algorithm.

1: function Build-Prefix-Function(P )
2: m ← |P |, k ← 0
3: π(1) ← 0
4: for q ← 2 to m do
5: while k > 0 ∧ P [q] 6= P [k + 1] do
6: k ← π(k)
7: if P [q] = P [k + 1] then
8: k ←k+1
9: π(q) ← k
10: return π
The following table lists the steps of building prefix function for pattern string ‘ananym’.
Note that the k in the table actually means the maximum k satisfies equation (14.22).
394 CHAPTER 14. SEARCHING

q Pq k Pk
1 a 0 “”
2 an 0 “”
3 ana 1 a
4 anan 2 an
5 anany 0 “”
6 ananym 0 “”
Translating the KMP algorithm to Python gives the below example code.
def kmp_match(w, p):
n = len(w)
m = len(p)
fallback = fprefix(p)
k = 0 # how many elements have been matched so far.
res = []
for i in range(n):
while k > 0 and p[k] ̸= w[i]:
k = fallback[k] #fall back
if p[k] == w[i]:
k = k + 1
if k == m:
[Link](i+1-m)
k = fallback[k-1] # look for next
return res

def fprefix(p):
m = len(p)
t = [0]∗m # fallback table
k = 0
for i in range(2, m):
while k>0 and p[i-1] ̸= p[k]:
k = t[k-1] #fallback
if p[i-1] == p[k]:
k = k + 1
t[i] = k
return t

The KMP algorithm builds the prefix function for the pattern string as a kind of
pre-processing before the search. Because of this, it can reuse as much information of the
previous matching as possible.
The amortized performance of building the prefix function is O(m). This can be
proved by using potential method as in [4]. Using the similar method, it can be proved
that the matching algorithm itself is also linear. Thus the total performance is O(m + n)
at the expense of the O(m) space to record the prefix function table.
It seems that varies pattern string would affect the performance of KMP. Considering
the case that we are finding pattern string ‘aaa...a’ of length m in a string ‘aaa...a’ of
length n. All the characters are same, when the last character in the pattern is examined,
we can only fallback by 1, and this 1 character fallback repeats until it falls back to zero.
Even in this extreme case, KMP algorithm still holds its linear performance (why?).
Please try to consider more cases such as P = aaaa...b, T = aaaa...a and so on.

Purely functional KMP algorithm

It is not easy to realize KMP matching algorithm in purely functional manner. The
imperative algorithm represented so far intensely uses array to record prefix function
values. Although it is possible to utilize sequence like structure in purely functional
settings, it is typically implemented with finger tree. Unlike native arrays, finger tree
14.2. SEQUENCE SEARCH 395

needs logarithm time for random accessing7 .

Richard Bird presents a formal program deduction to KMP algorithm by using fold
fusion law in chapter 17 of [1]. In this section, we show how to develop purely functional
KMP algorithm step by step from a brute-force prefix function creation method.
Both text string and pattern are represented as singly linked-list in purely functional
settings. During the scan process, these two lists are further partitioned, every one is
broken into two parts. As shown in figure 14.15, The first j characters in the pattern
string have been matched. T [i + 1] and P [j + 1] will be compared next. If they are same,
we need append the character to the matched part. However, since strings are essentially
singly linked list, such appending is proportion to j.

T[1] T[2] ... ... T[i] T[i+1] T[i+2] ... ... T[n-1] T[n] T

s
P[1] P[2] ... P[j] P[j+1] P[j+2] ... P[m] P

Figure 14.15: The first j characters in P are matched, next compare P [j +1] with T [i+1].

Denote the first i characters as Tp , which means the prefix of T , the rest characters
as Ts for suﬀix; Similarly, the first j characters as Pp , and the rest as Ps ; Denote the
first character of Ts as t, the first character of Ps as p. We have the following ‘cons’
relationship.

Ts = cons(t, Ts′ )
Ps = cons(p, Ps′ )

If t = p, note the following updating process is bound to linear time.

Tp′ = Tp ∪ {t}
Pp′ = Pp ∪ {p}

We’ve introduced a method in the chapter about purely functional queue, which can
solve this problem. By using a pair of front and rear list, we can turn the linear time
appending to constant time linking. The key point is to represent the prefix part in
reverse order.
←
−
T = Tp ∪ Ts = reverse(reverse(Tp )) ∪ Ts = reverse(Tp ) ∪ Ts
←− (14.23)
P = Pp ∪ Ps = reverse(reverse(Pp )) ∪ Ps = reverse(Pp ) ∪ Ps
←
− ←−
The idea is to using pair (Tp , Ts ) and (Pp , Ps ) instead. With this change, the if t = p,
we can update the prefix part fast in constant time.
←
− ←
−
Tp′ = cons(t, Tp )
←− ←− (14.24)
Pp′ = cons(p, Pp )

The KMP matching algorithm starts by initializing the success prefix parts to empty
strings as the following.

search(P, T ) = kmp(π, (ϕ, P )(ϕ, T )) (14.25)

7 Again, we don’t use native array, even it is supported in some functional programming environments

like Haskell.
396 CHAPTER 14. SEARCHING

Where π is the prefix function we explained before. The core part of KMP algorithm,
except for the prefix function building, can be defined as below.
 ←
−

 {|Tp |} :
Ps = ϕ ∧ Ts = ϕ



 Ps 6= ϕ ∧ Ts = ϕ
ϕ :

 ←
− ←− ←
−
←− ←
−  {|Tp } ∪ kmp(π, π(Pp , Ps ), (Tp , Ts )) :
Ps = ϕ ∧ Ts 6= ϕ
kmp(π, (Pp , Ps ), (Tp , Ts )) = ←− ←
−

 kmp(π, (Pp′ , Ps′ ), (Tp′ , Ts′ )) :
t=p

 ←− ←
− ←−

 kmp(π, π(Pp , Ps ), (Tp′ , Ts′ )) :
t 6= p ∧ Pp = ϕ

 ←− ←
− ←−

t 6= p ∧ Pp 6= ϕ
kmp(π, π(Pp , Ps ), (Tp , Ts )) :
(14.26)
The first clause states that, if the scan successfully ends to both the pattern and text
strings, we get a solution, and the algorithm terminates. Note that we use the right
position in the text string as the matching point. It’s easy to use the left position by
subtracting with the length of the pattern string. For sake of brevity, we switch to right
position in functional solutions.
The second clause states that if the scan arrives at the end of text string, while there
are still rest of characters in the pattern string haven’t been matched, there is no solution.
And the algorithm terminates.
The third clause states that, if all the characters in the pattern string have been
successfully matched, while there are still characters in the text haven’t been examined,
we get a solution, and we fallback by calling prefix function π to go on searching other
solutions.
The fourth clause deals with the case, that the next character in pattern string and
text are same. In such case, the algorithm advances one character ahead, and recursively
performs searching.
If the the next characters are not same and this is the first character in the pattern
string, we just need advance to next character in the text, and try again. Otherwise if
this isn’t the first character in the pattern, we call prefix function π to fallback, and try
again.
The brute-force way to build the prefix function is just to follow the definition equation
(14.22).
←− ←−
π(Pp , Ps ) = (Pp′ , Ps′ ) (14.27)

where
Pp′ = longest({s|s ∈ pref ixes(Pp ), s ⊐ Pp })
Ps′ = P − Pp′

Every time when calculate the fallback position, the algorithm naively enumerates all
prefixes of Pp , checks if it is also the suﬀix of Pp , and then pick the longest one as result.
Note that we reuse the subtraction symbol here for list differ operation.
There is a tricky case which should be avoided. Because any string itself is both its
prefix and suﬀix. Say Pp ⊏ Pp and Pp ⊐ Pp . We shouldn’t enumerate Pp as a candidate
prefix. One solution of such prefix enumeration can be realized as the following.
{
{ϕ} : L = ϕ ∨ |L| = 1
pref ixes(L) =
cons(ϕ, map(λs · cons(l1 , s), pref ixes(L′ ))) : otherwise
(14.28)
Below Haskell example program implements this version of string matching algorithm.
kmpSearch1 ptn text = kmpSearch' next ([], ptn) ([], text)
14.2. SEQUENCE SEARCH 397

kmpSearch' _ (sp, []) (sw, []) = [length sw]

kmpSearch' _ _ (_, []) = []
kmpSearch' f (sp, []) (sw, ws) = length sw : kmpSearch' f (f sp []) (sw, ws)
kmpSearch' f (sp, (p:ps)) (sw, (w:ws))
| p == w = kmpSearch' f ((p:sp), ps) ((w:sw), ws)
| otherwise = if sp ==[] then kmpSearch' f (sp, (p:ps)) ((w:sw), ws)
else kmpSearch' f (f sp (p:ps)) (sw, (w:ws))

next sp ps = (sp', ps') where

prev = reverse sp
prefix = longest [xs | xs ← inits prev, xs `isSuffixOf` prev]
sp' = reverse prefix
ps' = (prev ++ ps) \\ prefix
longest = maximumBy (compare `on` length)

inits [] = [[]]
inits [_] = [[]]
inits (x:xs) = [] : (map (x:) $ inits xs)

This version does not only perform poorly, but it is also complex. We can simplify it
a bit. Observing the KMP matching is a scan process from left to the right of the text, it
can be represented with folding (refer to Appendix A for detail). Firstly, we can augment
each character with an index for folding like below.

zip(T, {1, 2, ...}) (14.29)

Zipping the text string with infinity natural numbers gives list of pairs. For example,
text string ‘The quick brown fox jumps over the lazy dog’ turns into (T, 1), (h, 2), (e, 3),
... (o, 42), (g, 43).
The initial state for folding contains two parts, one is the pair of pattern (Pp , Ps ), with
prefix starts from empty, and the suﬀix is the whole pattern string (ϕ, P ). For illustration
←−
purpose only, we revert back to normal pairs but not (Pp , Ps ) notation. It can be easily
replaced with reversed form in the finalized version. This is left as exercise to the reader.
The other part is a list of positions, where the successful matching are found. It starts
from empty list. After the folding finishes, this list contains all solutions. What we need
is to extract this list from the final state. The core KMP search algorithm is simplified
like this.

kmp(P, T ) = snd(f old(search, ((ϕ, P ), ϕ), zip(T, {1, 2, ...}))) (14.30)

The only ‘black box’ is the search function, which takes a state, and a pair of character
and index, and it returns a new state as result. Denote the first character in Ps as p and
the rest characters as Ps′ (Ps = cons(p, Ps′ )), we have the following definition.


 ((Pp ∪ p, Ps′ ), L ∪ {i}) : p = c ∧ Ps′ = ϕ

((Pp ∪ p, Ps′ ), L) : p = c ∧ Ps′ 6= ϕ
search(((Pp , Ps ), L), (c, i)) =

 ((Pp , Ps ), L) : Pp = ϕ

search((π(Pp , Ps ), L), (c, i)) : otherwise
(14.31)
If the first character in Ps matches the current character c during scan, we need further
check if all the characters in the pattern have been examined, if so, we successfully find a
solution, This position i in list L is recorded; Otherwise, we advance one character ahead
and go on. If p does not match c, we need fallback for further retry. However, there is
an edge case that we can’t fallback any more. Pp is empty in this case, and we need do
nothing but keep the current state.
The prefix-function π developed so far can also be improved a bit. Since we want
to find the longest prefix of Pp , which is also suﬀix of it, we can scan from right to left
398 CHAPTER 14. SEARCHING

instead. For any non empty list L, denote the first element as l1 , and all the rest except
for the first one as L′ , define a function init(L), which returns all the elements except for
the last one as below.
{
ϕ : |L| = 1
init(L) = (14.32)
cons(l1 , init(L′ )) : otherwise

Note that this function can not handle empty list. The idea of scan from right to left
for Pp is first check if init(Pp ) ⊐ Pp , if yes, then we are done; otherwise, we examine if
init(init(Pp )) is OK, and repeat this till the left most. Based on this idea, the prefix-
function can be modified as the following.
{
(Pp , Ps ) : Pp = ϕ
π(Pp , Ps ) = (14.33)
f allback(init(Pp ), cons(last(Pp ), Ps )) : otherwise

Where
{
(A, B) : A ⊐ Pp
f allback(A, B) = (14.34)
(init(A), cons(last(A), B)) : otherwise

Note that fallback always terminates because empty string is suﬀix of any string. The
last(L) function returns the last element of a list, it is also a linear time operation (refer
←−
to Appendix A for detail). However, it’s constant operation if we use Pp approach. This
improved prefix-function is bound to linear time. It is still quite slower than the imperative
algorithm which can look up prefix-function in constant O(1) time. The following Haskell
example program implements this minor improvement.
failure ([], ys) = ([], ys)
failure (xs, ys) = fallback (init xs) (last xs:ys) where
fallback as bs | as `isSuffixOf` xs = (as, bs)
| otherwise = fallback (init as) (last as:bs)

kmpSearch ws txt = snd $ foldl f (([], ws), []) (zip txt [1..]) where
f (p@(xs, (y:ys)), ns) (x, n) | x == y = if ys==[] then ((xs++[y], ys), ns++[n])
else ((xs++[y], ys), ns)
| xs == [] = (p, ns)
| otherwise = f (failure p, ns) (x, n)
f (p, ns) e = f (failure p, ns) e

The bottleneck is that we can not use native array to record prefix functions in purely
functional settings. In fact the prefix function can be understood as a state transform
function. It transfer from one state to the other according to the matching is success or
fail. We can abstract such state changing as a tree. In environment supporting algebraic
data type, Haskell for example, such state tree can be defined like below.
data State a = E | S a (State a) (State a)

A state is either empty, or contains three parts: the current state, the new state if
match fails, and the new state if match succeeds. Such definition is quite similar to the
binary tree. We can call it ‘left-fail, right-success’ tree. The state we are using here is
(Pp , Ps ).
Similar as imperative KMP algorithm, which builds the prefix function from the pat-
tern string, the state transforming tree can also be built from the pattern. The idea is
to build the tree from the very beginning state (ϕ, P ), with both its children empty. We
replace the left child with a new state by calling π function defined above, and replace
the right child by advancing one character ahead. There is an edge case, that when the
14.2. SEQUENCE SEARCH 399

state transfers to (P, ϕ), we can not advance any more in success case, such node only
contains child for failure case. The build function is defined as the following.
{
build(π(Pp , Ps ), ϕ, ϕ) : Ps = ϕ
build((Pp , Ps ), ϕ, ϕ) = (14.35)
build((Pp , Ps ), L, R) : otherwise

Where
L = build(π(Pp , Ps ), ϕ, ϕ)
R = build((Ps ∪ {p}, Ps′ ), ϕ, ϕ))
The meaning of p and Ps′ are as same as before, that p is the first character in Ps , and
Ps′is the rest characters. The most interesting point is that the build function will never
stop. It endless build a infinite tree. In strict programming environment, calling this
function will freeze. However, in environments support lazy evaluation, only the nodes
have to be used will be created. For example, both Haskell and Scheme/Lisp are capable
to construct such infinite state tree. In imperative settings, it is typically realized by
using pointers which links to ancestor of a node.

(’’, ananym)

fail match

(’’, ananym) (a, nanym)

fail match fail match

(’’, ananym) (a, ananym) (’’, ananym) (an, anym)

fail match

... ... ... (’’, ananym) (ana, nym)

fail match

... (a, nanym) (anan, ym)

fail match

... (an, anym) (anany, m)

fail match

(’’, ananym) (ananym, ’’)

fail

(’’, ananym) empty

Figure 14.16: The infinite state tree for pattern ‘ananym’.

Figure 14.16 illustrates such an infinite state tree for pattern string ‘ananym’. Note
that the right most edge represents the case that the matching continuously succeed for
all characters. After that, since we can’t match any more, so the right sub-tree is empty.
Base on this fact, we can define a auxiliary function to test if a state indicates the whole
pattern is successfully matched.
{
T rue : Ps = ϕ
match((Pp , Ps ), L, R) = (14.36)
F alse : otherwise

With the help of state transform tree, we can realize KMP algorithm in an automaton
manner.

kmp(P, T ) = snd(f old(search, (T r, []), zip(T, {1, 2, ...}))) (14.37)

Where the tree T r = build((ϕ, P ), ϕ, ϕ) is the infinite state transform tree. Function
search utilizes this tree to transform the state according to match or fail. Denote the
400 CHAPTER 14. SEARCHING

first character in Ps as p, the rest characters as Ps′ , and the matched positions found so
far as A.


 (R, A ∪ {i}) : p = c ∧ match(R)

(R, A) : p = c ∧ ¬match(R)
search((((Pp , Ps ), L, R), A), (c, i)) =

 ((((Pp , Ps ), L, R), A) : Pp = ϕ

search((L, A), (c, i)) : otherwise
(14.38)
The following Haskell example program implements this algorithm.
data State a = E | S a (State a) (State a) −− state, ok-state, fail-state
deriving (Eq, Show)

build :: (Eq a)⇒State ([a], [a]) → State ([a], [a])

build (S s@(xs, []) E E) = S s (build (S (failure s) E E)) E
build (S s@(xs, (y:ys)) E E) = S s l r where
l = build (S (failure s) E E) −− fail state
r = build (S (xs++[y], ys) E E)

matched (S (_, []) _ _) = True

matched _ = False

kmpSearch3 :: (Eq a) ⇒ [a] → [a] → [Int]

kmpSearch3 ws txt = snd $ foldl f (auto, []) (zip txt [1..]) where
auto = build (S ([], ws) E E)
f (s@(S (xs, ys) l r), ns) (x, n)
| [x] `isPrefixOf` ys = if matched r then (r, ns++[n])
else (r, ns)
| xs == [] = (s, ns)
| otherwise = f (l, ns) (x, n)

The bottle-neck is that the state tree building function calls π to fallback. While
current definition of π isn’t effective enough, because it enumerates all candidates from
right to the left every time.
Since the state tree is infinite, we can adopt some common treatment for infinite
structures. One good example is the Fibonacci series. The first two Fibonacci numbers
are defined as 0 and 1; the rest Fibonacci numbers can be obtained by adding the previous
two numbers.
F0 = 0
F1 = 1 (14.39)
Fn = Fn−1 + Fn−2
Thus the Fibonacci numbers can be list one by one as the following
F0 =0
F1 =1
F2 = F1 + F0 (14.40)
F3 = F2 + F1
...
We can collect all numbers in both sides, and define F = {0, 1, F1 , F2 , ...}, Thus we
have the following equation.
F = {0, 1, F1 + F0 , F2 + F1 , ...}
= {0, 1} ∪ {x + y|x ∈ {F0 , F1 , F2 , ...}, y ∈ {F1 , F2 , F3 , ...}} (14.41)
= {0, 1} ∪ {x + y|x ∈ F, y ∈ F ′ }
Where F ′ = tail(F ) is all the Fibonacci numbers except for the first one. In environ-
ments support lazy evaluation, like Haskell for instance, this definition can be expressed
like below.
14.2. SEQUENCE SEARCH 401

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

The recursive definition for infinite Fibonacci series indicates an idea which can be
used to get rid of the fallback function π. Denote the state transfer tree as T , we can
define the transfer function when matching a character on this tree as the following.

 root : T = ϕ
trans(T, c) = R : T = ((Pp , Ps ), L, R), c = p (14.42)

trans(L, c) : otherwise

If we match a character against empty node, we transfer to the root of the tree. We’ll
define the root later soon. Otherwise, we compare if the character c is as same as the first
character p in Ps . If they match, then we transfer to the right sub tree for this success
case; otherwise, we transfer to the left sub tree for fail case.
With transfer function defined, we can modify the previous tree building function
accordingly. This is quite similar to the previous Fibonacci series definition.

build(T, (Pp , Ps )) = ((Pp , Ps ), T, build(trans(T, p), (Pp ∪ {p}, Ps′ )))

The right hand of this equation contains three parts. The first one is the state that we
are matching (Pp , Ps ); If the match fails, Since T itself can handle any fail case, we use
it directly as the left sub tree; otherwise we recursive build the right sub tree for success
case by advancing one character ahead, and calling transfer function we defined above.
However, there is an edge case which has to be handled specially, that if Ps is empty,
which indicates a successful match. As defined above, there isn’t right sub tree any more.
Combining these cases gives the final building function.
{
((Pp , Ps ), T, ϕ) : Ps = ϕ
build(T, (Pp , Ps )) =
((Pp , Ps ), T, build(trans(T, p), (Pp ∪ {p}, Ps′ ))) : otherwise
(14.43)
The last brick is to define the root of the infinite state transfer tree, which initializes
the building.

root = build(ϕ, (ϕ, P )) (14.44)

And the new KMP matching algorithm is modified with this root.

kmp(P, T ) = snd(f old(trans, (root, []), zip(T, {1, 2, ...}))) (14.45)

The following Haskell example program implements this final version.

kmpSearch ws txt = snd $ foldl tr (root, []) (zip txt [1..]) where
root = build' E ([], ws)
build' fails (xs, []) = S (xs, []) fails E
build' fails s@(xs, (y:ys)) = S s fails succs where
succs = build' (fst (tr (fails, []) (y, 0))) (xs++[y], ys)
tr (E, ns) _ = (root, ns)
tr ((S (xs, ys) fails succs), ns) (x, n)
| [x] `isPrefixOf` ys = if matched succs then (succs, ns++[n]) else (succs, ns)
| otherwise = tr (fails, ns) (x, n)

Figure 14.17 shows the first 4 steps when search ‘anaym’ in text ’anal’. Since the first
3 steps all succeed, so the left sub trees of these 3 states are not actually constructed.
They are marked as ‘?’. In the fourth step, the match fails, thus the right sub tree needn’t
be built. On the other hand, we must construct the left sub tree, which is on top of the
result of trans(right(right(right(T ))), n), where function right(T ) returns the right sub
402 CHAPTER 14. SEARCHING

(’’, ananym)

fail match

? (a, nanym)

fail match

? (an, anym)

fail match

? (ana, nym)

fail match

(a, nanym) ?

Figure 14.17: On demand construct the state transform tree when searching ‘ananym’ in
text ‘anal’.

tree of T . This can be further expanded according to the definition of building and state
transforming functions till we get the concrete state ((a, nanym), L, R). The detailed
deduce process is left as exercise to the reader.
This algorithm depends on the lazy evaluation critically. All the states to be trans-
ferred are built on demand. So that the building process is amortized O(m), and the total
performance is amortized O(n + m). Readers can refer to [1] for detailed proof of it.
It’s worth of comparing the final purely functional and the imperative algorithms.
In many cases, we have expressive functional realization, however, for KMP matching
algorithm, the imperative approach is much simpler and more intuitive. This is because
we have to mimic the raw array by a infinite state transfer tree.

Boyer-Moore
Boyer-Moore string matching algorithm is another effective solution invited in 1977 [86].
The idea of Boyer-Moore algorithm comes from the following observation.

The bad character heuristics

When attempt to match the pattern, even if there are several characters from the left
are same, it fails if the last one does not match, as shown in figure 14.18. What’s more,
we wouldn’t find a match even if we slide the pattern down by 1, or 2. Actually, the
length of the pattern ‘ananym’ is 6, the last character is ‘m’, however, the corresponding
character in the text is ‘h’. It does not appear in the pattern at all. We can directly slide
the pattern down by 6.

a n y a n a n t h o u s a n a n y m f l o w e r T

s
a n a n y m P

Figure 14.18: Since character ‘h’ doesn’t appear in the pattern, we wouldn’t find a match
if we slide the pattern down less than the length of the pattern.
14.2. SEQUENCE SEARCH 403

This leads to the bad-character rule. We can do a pre-processing for the pattern. If
the character set of the text is already known, we can find all characters which don’t
appear in the pattern string. During the later scan process, as long as we find such a bad
character, we can immediately slide the pattern down by its length. The question is what
if the unmatched character does appear in the pattern? While, in order not to miss any
potential matches, we have to slide down the pattern to check again. This is shown as in
the figure 14.19

i s s i m p l e ... T

e x a m p l e P
|
(a)
The
last
char-
ac-
ter
in
the
pat-
tern
‘e’
doesn’t
match
‘p’.
How-
ever,
‘p’
ap-
pears
in
the
pat-
tern.
i s s i m p l e ... T

e x a m p l e P

(b) We have to slide the pattern down

by 2 to check again.

Figure 14.19: Slide the pattern if the unmatched character appears in the pattern.

It’s quite possible that the unmatched character appears in the pattern more than
one position. Denote the length of the pattern as |P |, the character appears in positions
p1 , p2 , ..., pi . In such case, we take the right most one to avoid missing any matches.

s = |P | − pi (14.46)

Note that the shifting length is 0 for the last position in the pattern according to the
above equation. Thus we can skip it in realization. Another important point is that since
404 CHAPTER 14. SEARCHING

the shifting length is calculated against the position aligned with the last character in the
pattern string, (we deduce it from |P |), no matter where the mismatching happens when
we scan from right to the left, we slide down the pattern string by looking up the bad
character table with the one in the text aligned with the last character of the pattern.
This is shown in figure 14.20.

i s s i m p l e ... T i s s i m p l e ... T

e x a m p l e P e x a m p l e P

(a) (b)

Figure 14.20: Even the mismatching happens in the middle, between char ‘i’ and ‘a’, we
look up the shifting value with character ‘e’, which is 6 (calculated from the first ‘e’, the
second ‘e’ is skipped to avoid zero shifting).

There is a good result in practice, that only using the bad-character rule leads to a
simple and fast string matching algorithm, called Boyer-Moore-Horspool algorithm [87].
1: procedure Boyer-Moore-Horspool(T, P )
2: for ∀c ∈ Σ do
3: π[c] ← |P |
4: for i ← 1 to |P | − 1 do ▷ Skip the last position
5: π[P [i]] ← |P | − i
6: s←0
7: while s + |P | ≤ |T | do
8: i ← |P |
9: while i ≥ 1 ∧ P [i] = T [s + i] do ▷ scan from right
10: i←i−1
11: if i < 1 then
12: found one solution at s
13: s←s+1 ▷ go on finding the next
14: else
15: s ← s + π[T [s + |P |]]
The character set is denoted as Σ, we first initialize all the values of sliding table π
as the length of the pattern string |P |. After that we process the pattern from left to
right, update the sliding value. If a character appears multiple times in the pattern, the
latter value, which is on the right hand, will overwrite the previous value. We start the
matching scan process by aligning the pattern and the text string from the very left.
However, for every alignment s, we scan from the right to the left until either there is
unmatched character or all the characters in the pattern have been examined. The latter
case indicates that we’ve found a match; while for the former case, we look up π to slide
the pattern down to the right.
The following example Python code implements this algorithm accordingly.
def bmh_match(w, p):
n = len(w)
m = len(p)
tab = [m for _ in range(256)] # table to hold the bad character rule.
for i in range(m-1):
tab[ord(p[i])] = m - 1 - i
res = []
offset = 0
14.2. SEQUENCE SEARCH 405

while offset + m ≤ n:
i = m - 1
while i ≥ 0 and p[i] == w[offset+i]:
i = i - 1
if i < 0:
[Link](offset)
offset = offset + 1
else:
offset = offset + tab[ord(w[offset + m - 1])]
return res

The algorithm firstly takes about O(|Σ| + |P |) time to build the sliding table. If the
character set size is small, the performance is dominated by the pattern and the text.
There is definitely the worst case that all the characters in the pattern and text are same,
e.g. searching ‘aa...a’ (m of ‘a’, denoted as am ) in text ‘aa......a’ (n of ‘a’, denoted as an ).
The performance in the worst case is O(mn). This algorithm performs well if the pattern
is long, and there are constant number of matching. The result is bound to linear time.
This is as same as the best case of full Boyer-Moore algorithm which will be explained
next.

The good suﬀix heuristics

Consider searching pattern ‘abbabab’ in text ‘bbbababbabab...’ like figure 14.21. By
using the bad-character rule, the pattern will be slided by two.

b b b a b a b b a b a b ... T

a b b a b a b P

(a)
b b b a b a b b a b a b ... T

a b b a b a b P

(b)

Figure 14.21: According to the bad-character rule, the pattern is slided by 2, so that the
next ‘b’ is aligned.

Actually, we can do better than this. Observing that before the unmatched point, we
have already successfully matched 6 characters ‘bbabab’ from right to the left. Since ‘ab’,
which is the prefix of the pattern is also the suffix of what we matched so far, we can
directly slide the pattern to align this suffix as shown in figure 14.22.
This is quite similar to the pre-processing of KMP algorithm, However, we can’t always
skip so many characters. Consider the following example as shown in figure 14.23. We
have matched characters ‘bab’ when the unmatch happens. Although the prefix ‘ab’ of
the pattern is also the suffix of ‘bab’, we can’t slide the pattern so far. This is because
‘bab’ appears somewhere else, which starts from the 3rd character of the pattern. In
order not to miss any potential matching, we can only slide the pattern by two.
The above situation forms the two cases of the good-suffix rule, as shown in figure
14.24.
406 CHAPTER 14. SEARCHING

b b b a b a b b a b a b ... T

a b b a b a b P

Figure 14.22: As the prefix ‘ab’ is also the suﬀix of what we’ve matched, we can slide
down the pattern to a position so that ‘ab’ are aligned.

b a a b b a b a b ... T
b a a b b a b a b ... T

a b b a b a b P a b b a b a b P

(a) (b)

Figure 14.23: We’ve matched ‘bab’, which appears somewhere else in the pattern (from
the 3rd to the 5th character). We can only slide down the pattern by 2 to avoid missing
any potential matching.

Both cases in good suﬀix rule handle the situation that there are multiple characters
have been matched from right. We can slide the pattern to the right if any of the the
following happens.

• Case 1 states that if a part of the matching suﬀix occurs as a prefix of the pattern,
and the matching suﬀix doesn’t appear in any other places in the pattern, we can
slide the pattern to the right to make this prefix aligned;

• Case 2 states that if the matching suﬀix occurs some where else in the pattern, we
can slide the pattern to make the right most occurrence aligned.

Note that in the scan process, we should apply case 2 first whenever it is possible, and
then examine case 1 if the whole matched suffix does not appears in the pattern. Observe
that both cases of the good-suffix rule only depend on the pattern string, a table can be
built by pre-process the pattern for further looking up.
For the sake of brevity, we denote the suffix string from the i-th character of P as Pi .
That Pi is the sub-string P [i]P [i + 1]...P [m].
For case 1, we can check every suffix of P , which includes Pm , Pm−1 , Pm−2 , ..., P2 to
examine if it is the prefix of P . This can be achieved by a round of scan from right to the
left.
For case 2, we can check every prefix of P includes P1 , P2 , ..., Pm−1 to examine if the
longest suffix is also a suffix of P . This can be achieved by another round of scan from
left to the right.
1: function Good-Suffix(P )
2: m ← |P |
3: πs ← {0, 0, ..., 0} ▷ Initialize the table of length m
4: l←0 ▷ The last suffix which is also prefix of P
5: for i ← m − 1 down-to 1 do ▷ First loop for case 1
6: if Pi ⊏ P then ▷ ⊏ means ‘is prefix of’
7: l←i
8: πs [i] ← l
14.2. SEQUENCE SEARCH 407

(a) Case 1, Only a part of the matching suﬀix occurs as a prefix of the pattern.

(b) Case 2, The matching suﬀix occurs some where else in the pattern.

Figure 14.24: The light gray section in the text represents the characters have been
matched; The dark gray parts indicate the same content in the pattern.
408 CHAPTER 14. SEARCHING

9: for i ← 1 to m do ▷ Second loop for case 2

10: s ← Suffix-Length(Pi )
11: if s 6= 0 ∧ P [i − s] 6= P [m − s] then
12: πs [m − s] ← m − i
13: return πs
This algorithm builds the good-suffix heuristics table πs . It first checks every suffix of
P from the shortest to the longest. If the suffix Pi is also the prefix of P , we record this
suffix, and use it for all the entries until we find another suffix Pj , j < i, and it is also
the prefix of P .
After that, the algorithm checks every prefix of P from the shortest to the longest. It
calls the function Suffix-Length(Pi ), to calculate the length of the longest suffix of Pi ,
which is also suffix of P . If this length s isn’t zero, which means there exists a sub-string,
that appears as the suffix of the pattern. It indicates that case 2 happens. The algorithm
overwrites the s-th entry from the right of the table πs . Note that to avoid finding the
same occurrence of the matched suffix, we test if P [i − s] and P [m − s] are same.
Function Suffix-Length is designed as the following.
1: function Suffix-Length(Pi )
2: m ← |P |
3: j←0
4: while P [m − j] = P [i − j] ∧ j < i do
5: j ←j+1
6: return j
The following Python example program implements the good-suffix rule.
def good_suffix(p):
m = len(p)
tab = [0 for _ in range(m)]
last = 0
# first loop for case 1
for i in range(m-1, 0, -1): # m-1, m-2, ..., 1
if is_prefix(p, i):
last = i
tab[i - 1] = last
# second loop for case 2
for i in range(m):
slen = suffix_len(p, i)
if slen ̸= 0 and p[i - slen] ̸= p[m - 1 - slen]:
tab[m - 1 - slen] = m - 1 - i
return tab

# test if p[i..m-1] ‘is prefix of‘ p

def is_prefix(p, i):
for j in range(len(p) - i):
if p[j] ̸= p [i+j]:
return False
return True

# length of the longest suﬀix of p[..i], which is also a suﬀix of p

def suffix_len(p, i):
m = len(p)
j = 0
while p[m - 1 - j] == p[i - j] and j < i:
j = j + 1
return j

It’s quite possible that both the bad-character rule and the good-suﬀix rule can be
applied when the unmatch happens. The Boyer-Moore algorithm compares and picks the
14.2. SEQUENCE SEARCH 409

bigger shift so that it can find the solution as quick as possible. The bad-character rule
table can be explicitly built as below
1: function Bad-Character(P )
2: for ∀c ∈ Σ do
3: πb [c] ← |P |
4: for i ← 1 to |P | − 1 do
5: πb [P [i]] ← |P | − i
6: return πb
The following Python program implements the bad-character rule accordingly.
def bad_char(p):
m = len(p)
tab = [m for _ in range(256)]
for i in range(m-1):
tab[ord(p[i])] = m - 1 - i
return tab

The final Boyer-Moore algorithm firstly builds the two rules from the pattern, then
aligns the pattern to the beginning of the text and scans from right to the left for every
alignment. If any unmatch happens, it tries both rules, and slides the pattern with the
bigger shift.
1: function Boyer-Moore(T, P )
2: n ← |T |, m ← |P |
3: πb ← Bad-Character(P )
4: πs ← Good-Suffix(P )
5: s←0
6: while s + m ≤ n do
7: i←m
8: while i ≥ 1 ∧ P [i] = T [s + i] do
9: i←i−1
10: if i < 1 then
11: found one solution at s
12: s←s+1 ▷ go on finding the next
13: else
14: s ← s + max(πb [T [s + m]], πs [i])
Here is the example implementation of Boyer-Moore algorithm in Python.
def bm_match(w, p):
n = len(w)
m = len(p)
tab1 = bad_char(p)
tab2 = good_suffix(p)
res = []
offset = 0
while offset + m ≤ n:
i = m - 1
while i ≥ 0 and p[i] == w[offset + i]:
i = i - 1
if i < 0:
[Link](offset)
offset = offset + 1
else:
offset = offset + max(tab1[ord(w[offset + m - 1])], tab2[i])
return res

The Boyer-Moore algorithm published in original paper is bound to O(n + m) in worst

case only if the pattern doesn’t appear in the text [86]. Knuth, Morris, and Pratt proved
410 CHAPTER 14. SEARCHING

this fact in 1977 [88]. However, when the pattern appears in the text, as we shown above,
Boyer-Moore performs O(nm) in the worst case.
Richard Birds shows a purely functional realization of Boyer-Moore algorithm in chap-
ter 16 in [1]. We skipped it in this book.

Exercise 14.2

• Proof that Boyer-Moore majority vote algorithm is correct.

• Given a list, find the element occurs most. Are there any divide and conqueror
solutions? Are there any divide and conqueror data structures, such as map can be
used?
• How to find the elements occur more than 1/3 in a list? How to find the elements
occur more than 1/m in the list?
• If we reject the empty array as valid sub-array, how to realize the maximum sum of
sub-arrays puzzle?
• Bentley presents a divide and conquer algorithm to find the maximum sum in
O(n log n) time in [4]. The idea is to split the list at the middle point. We can
recursively find the maximum sum in the first half and second half; However, we
also need to find maximum sum cross the middle point. The method is to scan from
the middle point to both ends as the following.
1: function Max-Sum(A)
2: if A = ϕ then
3: return 0
4: else if |A| = 1 then
5: return Max(0, A[1])
6: else
7: m ← b |A|
2 c
8: a ← Max-From(Reverse(A[1...m]))
9: b ← Max-From(A[m + 1...|A|])
10: c ← Max-Sum(A[1...m])
11: d ← Max-Sum(A[m + 1...|A|)
12: return Max(a + b, c, d)

13: function Max-From(A)

14: sum ← 0, m ← 0
15: for i ← 1 to |A| do
16: sum ← sum + A[i]
17: m ← Max(m, sum)
18: return m
It’s easy to deduce the time performance is T (n) = 2T (n/2) + O(n). Implement
this algorithm in your favorite programming language.
• Given a m × n matrix contains positive and negative numbers, find the sub metrics
with maximum sum of its elements.
• Given n non-negative integers representing an elevation map where the width of
each bar is 1, compute how much water it is able to trap after raining. Figure 14.25
shows an example. For example, Given {0, 1, 0, 2, 1, 0, 1, 3, 2, 1, 2, 1}, the result is 6.
• Explain why KMP algorithm perform in linear time even in the seemed ‘worst’ case.
14.3. SOLUTION SEARCHING 411

Figure 14.25: Shadowed areas are waters.

• Implement the purely functional KMP algorithm by using reversed Pp to avoid the
linear time appending operation.

• Deduce the state of the tree lef t(right(right(right(T )))) when searching ‘ananym’
in text ‘anal’.

14.3 Solution searching

One interesting thing that computer programming can offer is solving puzzles. In the
early phase of classic artificial intelligent, people developed many methods to search for
solutions. Different from the sequence searching and string matching, the solution doesn’t
obviously exist among a candidates set. It typically need construct the solution while
trying varies of attempts. Some problems are solvable, while others are not. Among the
solvable problems, not all of them just have one unique solution. For example, a maze
may have multiple ways out. People sometimes need search for the best one.

14.3.1 DFS and BFS

DFS and BFS stand for deep-first search and breadth-first search. They are typically
introduced as graph algorithms in textbooks. Graph is a comprehensive topic which is
hard to be covered in this elementary book. In this section, we’ll show how to use DFS
and BFS to solve some real puzzles without formal introduction about the graph concept.

Maze
Maze is a classic and popular puzzle. Maze is amazing to both kids and adults. Figure
14.26 shows an example maze. There are also real maze gardens can be found in parks
for fun. In the late 1990s, maze-solving games were quite often hold in robot mouse
competition all over the world.
There are multiple methods to solve maze puzzle. We’ll introduce an effective, yet not
the best one in this section. There are some well known sayings about how to find the
way out in maze, while not all of them are true.
For example, one method states that, wherever you have multiple ways, always turn
right. This doesn’t work as shown in figure 14.27. The obvious solution is first to go
along the top horizontal line, then turn right, and keep going ahead at the ’T’ section.
However, if we always turn right, we’ll endless loop around the inner big block.
This example tells us that the decision when there are multiple choices matters the
solution. Like the fairy tale we read in our childhood, we can take some bread crumbs
in a maze. When there are multiple ways, we can simply select one, left a piece of bread
412 CHAPTER 14. SEARCHING

Figure 14.26: A maze

Figure 14.27: It leads to loop way if always turns right.

crumbs to mark this attempt. If we enter a died end, we go back to the last place where
we’ve made a decision by back-tracking the bread crumbs. Then we can alter to another
way.
At any time, if we find there have been already bread crumbs left, it means we have
entered a loop, we must go back and try different ways. Repeat these try-and-check
steps, we can either find the way out, or give the ‘no solution’ fact. In the later case, we
back-track to the start point.
One easy way to describe a maze, is by a m × n matrix, each element is either 0 or 1,
which indicates if there is a way at this cell. The maze illustrated in figure 14.27 can be
defined as the following matrix.
0 0 0 0 0 0
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
0 0 0 0 0 0
1 1 1 1 1 0

Given a start point s = (i, j), and a goal e = (p, q), we need find all solutions, that
are the paths from s to e.
There is an obviously recursive exhaustive search method. That in order to find all
paths from s to e, we can check all connected points to s, for every such point k, we
recursively find all paths from k to e. This method can be illustrated as the following.

• Trivial case, if the start point s is as same as the target point e, we are done;
• Otherwise, for every connected point k to s, recursively find the paths from k to e;
If e can be reached via k, put section s-k in front of each path between k and e.
14.3. SOLUTION SEARCHING 413

However, we have to left ’bread crumbs’ to avoid repeatedly trying the same attempts.
This is because otherwise in the recursive case, we start from s, find a connected point
k, then we further try to find paths from k to e. Since s is connected to k as well, so in
the next recursion, we’ll try to find paths from s to e again. It turns to be the very same
origin problem, and we are trapped in infinite recursions.
Our solution is to initialize an empty list, use it to record all the points we’ve visited
so far. For every connected point, we look up the list to examine if it has already been
visited. We skip all the visited candidates and only try those new ones. The corresponding
algorithm can be defined like this.

solveM aze(m, s, e) = solve(s, {ϕ}) (14.47)

Where m is the matrix which defines a maze, s is the start point, and e is the end
point. Function solve is defined in the context of solveM aze, so that the maze and the
end point can be accessed. It can be realized recursively like what we described above8 .

 {{s} ∪ p|p ∈ P } : s = e
solve(s, P ) = concat({ solve(s′ , {{s} ∪ p|p ∈ P })| (14.48)
 : otherwise
s′ ∈ adj(s), ¬visited(s′ )})

Note that P also serves as an accumulator. Every connected point is recorded in all
the possible paths to the current position. But they are stored in reversed order, that is
the newly visited point is put to the head of all the lists, and the starting point is the last
one. This is because the appending operation is linear (O(n), where n is the number of
elements stored in a list), while linking to the head is just constant time. We can output
the result in correct order by reversing all possible solutions in equation (14.47)9 :

solveM aze(m, s, e) = map(reverse, solve(s, {ϕ})) (14.49)

We need define functions adj(p) and visited(p), which finds all the connected points
to p, and tests if point p has been visited respectively. Two points are connected if and
only if they are next cells horizontally or vertically in the maze matrix, and both have
zero value.
adj((x, y)) = {(x′ , y ′ )| (x′ , y ′ ) ∈ {(x − 1, y), (x + 1, y), (x, y − 1), (x, y + 1)},
(14.50)
1 ≤ x′ ≤ M, 1 ≤ y ′ ≤ N, mx′ y′ = 0}

Where M and N are the widths and heights of the maze.

Function visited(p) examines if point p has been recorded in any lists in P .

visited(p) = ∃path ∈ P, p ∈ path (14.51)

The following Haskell example code implements this algorithm.

solveMaze m from to = map reverse $ solve from [[]] where
solve p paths | p == to = map (p:) paths
| otherwise = concat [solve p' (map (p:) paths) |
p' ← adjacent p,
not $ visited p' paths]
adjacent (x, y) = [(x', y') |
(x', y') ← [(x-1, y), (x+1, y), (x, y-1), (x, y+1)],
inRange (bounds m) (x', y'),
m ! (x', y') == 0]
visited p paths = any (p `elem`) paths

8 Function concat can flatten a list of lists. For example. concat({{a, b, c}, {x, y, z}}) = {a, b, c, x, y, z}.

Refer to appendix A for detail.

9 the detailed definition of reverse can be found in the appendix A.
414 CHAPTER 14. SEARCHING

For a maze defined as matrix like below example, all the solutions can be given by
this program.
mz = [[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0]]

maze = listArray ((1,1), (6, 6)) ◦ concat

solveMaze (maze mz) (1,1) (6, 6)

As we mentioned, this is a style of ’exhaustive search’. It recursively searches all the

connected points as candidates. In a real maze solving game, a robot mouse competition
for instance, it’s enough to just find a route. We can adapt to a method close to what
described at the beginning of this section. The robot mouse always tries the first connected
point, and skip the others until it gets stuck. We need some data structure to store the
’bread crumbs’, which help to remember the decisions being made. As we always attempt
to find the way on top of the latest decision, it is the last-in, first-out manner. A stack
can be used to realize it.
At the very beginning, only the starting point s is stored in the stack. we pop it out,
find, for example, points a, and b, are connected to s. We push the two possible paths:
{a, s} and {b, s} to the stack. Next we pop {a, s} out, and examine connected points to a.
Then all the paths with 3 steps will be pushed back. We repeat this process. At anytime,
each element stored in the stack is a path, from the starting point to the farthest place
can arrive in the reversed order. This can be illustrated in figure 14.28.

i j k

[i, p, ... , s]
[p, ... , s] [j, p, ..., s]
[a, s]
[s] ... [q, ..., s] [k, p, ..., s]
[b, s]
... [q, ..., s]
...

Figure 14.28: The stack is initialized with a singleton list of the starting point s. s is
connected with point a and b. Paths {a, s} and {b, s} are pushed back. In some step,
the path ended with point p is popped. p is connected with points i, j, and k. These 3
points are expanded as different options and pushed back to the stack. The candidate
path ended with q won’t be examined unless all the options above fail.

The stack can be realized with a list. The latest option is picked from the head, and
the new candidates are also added to the head. The maze puzzle can be solved by using
such a list of paths:

solveM aze′ (m, s, e) = reverse(solve′ ({{s}})) (14.52)

As we are searching the first, but not all the solutions, map isn’t used here. When
the stack is empty, it means that we’ve tried all the options and failed to find a way out.
14.3. SOLUTION SEARCHING 415

There is no solution; otherwise, the top option is popped, expanded with all the adjacent
points which haven’t been visited before, and pushed back to the stack. Denote the stack
as S, if it isn’t empty, the top element is s1 , and the new stack after the top being popped
as S ′ . s1 is a list of points represents path P . Denote the first point in this path as p1 ,
and the rest as P ′ . The solution can be formalized as the following.


 ϕ : S=ϕ

s : s1 = e
solve′ (S) = ′ ′
1

 solve (S ) : C = {c|c ∈ adj(p1 ), c 6∈ P ′ } = ϕ
 ′
solve ({{p} ∪ P |p ∈ C} ∪ S) : otherwise, C 6= ϕ
(14.53)
Where the adj function is defined above. This updated maze solution can be imple-
mented with the below example Haskell program 10 .
dfsSolve m from to = reverse $ solve [[from]] where
solve [] = []
solve (c@(p:path):cs)
| p == to = c −− stop at the first solution
| otherwise = let os = filter (`notElem` path) (adjacent p) in
if os == []
then solve cs
else solve ((map (:c) os) ++ cs)

It’s quite easy to modify this algorithm to find all solutions. When we find a path in
the second clause, instead of returning it immediately, we record it and go on checking
the rest memorized options in the stack till until the stack becomes empty. We left it as
an exercise to the reader.
The same idea can also be realized imperatively. We maintain a stack to store all
possible paths from the starting point. In each iteration, the top option path is popped,
if the farthest position is the end point, a solution is found; otherwise, all the adjacent,
not visited yet points are appended as new paths and pushed back to the stack. This is
repeated till all the candidate paths in the stacks are checked.
We use the same notation to represent the stack S. But the paths will be stored as
arrays instead of list in imperative settings as the former is more effective. Because of this
the starting point is the first element in the path array, while the farthest reached place
is the right most element. We use pn to represent Last(P ) for path P . The imperative
algorithm can be given as below.
1: function Solve-Maze(m, s, e)
2: S←ϕ
3: Push(S, {s})
4: L←ϕ ▷ the result list
5: while S 6= ϕ do
6: P ← Pop(S)
7: if e = pn then
8: Add(L, P )
9: else
10: for ∀p ∈ Adjacent(m, pn ) do
11: if p ∈
/ P then
12: Push(S, P ∪ {p})
13: return L
The following example Python program implements this maze solving algorithm.
def solve(m, src, dst):

10 The same code of adjacent function is skipped

416 CHAPTER 14. SEARCHING

stack = [[src]]
s = []
while stack ̸= []:
path = [Link]()
if path[-1] == dst:
[Link](path)
else:
for p in adjacent(m, path[-1]):
if not p in path:
[Link](path + [p])
return s

def adjacent(m, p):

(x, y) = p
ds = [(0, 1), (0, -1), (1, 0), (-1, 0)]
ps = []
for (dx, dy) in ds:
x1 = x + dx
y1 = y + dy
if 0 ≤ x1 and x1 < len(m[0]) and
0 ≤ y1 and y1 < len(m) and m[y][x] == 0:
[Link]((x1, y1))
return ps

And the same maze example given above can be solved by this program like the
following.
mz = [[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0]]

solve(mz, (0, 0), (5,5))

It seems that in the worst case, there are 4 options (up, down, left, and right) at each
step, each option is pushed to the stack and eventually examined during backtracking.
Thus the complexity is bound to O(4n ). The actual time won’t be so large because we
filtered out the places which have been visited before. In the worst case, all the reachable
points are visited exactly once. So the time is bound to O(n), where n is the number
of points connected in total. As a stack is used to store candidate solutions, the space
complexity is O(n2 ).

Eight queens puzzle

The eight queens puzzle is also a famous problem. Although cheese has very long history,
this puzzle was first published in 1848 by Max Bezzel[89]. Queen in the cheese game is
quite powerful. It can attack any other pieces in the same row, column and diagonal at
any distance. The puzzle is to find a solution to put 8 queens in the board, so that none
of them attack each other. Figure 14.29 (a) illustrates the places can be attacked by a
queen and 14.29 (b) shows a solution of 8 queens puzzle.
8
It’s obviously that the puzzle can be solved by brute-force, which takes P64 times.
This number is about 4 × 10 . It can be easily improved by observing that, no two
10

queens can be in the same row, and each queen must be put on one column between 1
to 8. Thus we can represent the arrangement as a permutation of {1, 2, 3, 4, 5, 6, 7, 8}.
For instance, the arrangement {6, 2, 7, 1, 3, 5, 8, 4} means, we put the first queen at row 1,
column 6, the second queen at row 2 column 2, ..., and the last queen at row 8, column
4. By this means, we need only examine 8! = 40320 possibilities.
14.3. SOLUTION SEARCHING 417

(a) A queen piece. (b) An example solution

Figure 14.29: The eight queens puzzle.

We can find better solutions than this. Similar to the maze puzzle, we put queens one
by one from the first row. For the first queen, there are 8 options, that we can put it at
one of the eight columns. Then for the next queen, we again examine the 8 candidate
columns. Some of them are not valid because those positions will be attacked by the first
queen. We repeat this process, for the i-th queen, we examine the 8 columns in row i,
find which columns are safe. If none column is valid, it means all the columns in this row
will be attacked by some queen we’ve previously arranged, we have to backtrack as what
we did in the maze puzzle. When all the 8 queens are successfully put to the board, we
find a solution. In order to find all the possible solutions, we need record it and go on to
examine other candidate columns and perform back tracking if necessary. This process
terminates when all the columns in the first row have been examined. The below equation
starts the search.
solve({ϕ}, ϕ) (14.54)
In order to manage the candidate attempts, a stack S is used as same as in the maze
puzzle. The stack is initialized with one empty element. And a list L is used to record all
possible solutions. Denote the top element in the stack as s1 . It’s actually an intermediate
state of assignment, which is a partial permutation of 1 to 8. after pops s1 , the stack
becomes S ′ . The solve function can be defined as the following.


 L : S=ϕ

 ′
  solve(S ,
 {s1 } ∪ L) : |s1 | = 8
solve(S, L) =  {i} ∪ s1 | i ∈ [1, 8],  (14.55)

 ′

 solve( i ∈
/ s1 , ∪ S , L) : otherwise
  
saf e(i, s1 )
If the stack is empty, all the possible candidates have been examined, it’s not possible
to backtrack any more. L has been accumulated all found solutions and returned as the
result; Otherwise, if the length of the top element in the stack is 8, a valid solution is
found. We add it to L, and go on finding other solutions; If the length is less than 8,
we need try to put the next queen. Among all the columns from 1 to 8, we pick those
not already occupied by previous queens (through the i ∈ / s1 clause), and must not be
attacked in diagonal direction (through the saf e predication). The valid assignments will
be pushed to the stack for the further searching.
Function saf e(x, C) detects if the assignment of a queen in position x will be attacked
by other queens in C in diagonal direction. There are 2 possible cases, 45◦ and 135◦
directions. Since the row of this new queen is y = 1 + |C|, where |C| is the length of C,
the saf e function can be defined as the following.
saf e(x, C) = ∀(c, r) ∈ zip(reverse(C), {1, 2, ...}), |x − c| 6= |y − r| (14.56)
418 CHAPTER 14. SEARCHING

Where zip takes two lists, and pairs every elements in them to a new list. Thus If
C = {ci−1 , ci−2 , ..., c2 , c1 } represents the column of the first i−1 queens has been assigned,
the above function will check list of pairs {(c1 , 1), (c2 , 2), ..., (ci−1 , i − 1)} with position
(x, y) forms any diagonal lines.
Translating this algorithm into Haskell gives the below example program.
solve = dfsSolve [[]] [] where
dfsSolve [] s = s
dfsSolve (c:cs) s
| length c == 8 = dfsSolve cs (c:s)
| otherwise = dfsSolve ([(x:c) | x ← [1..8] \\ c,
not $ attack x c] ++ cs) s
attack x cs = let y = 1 + length cs in
any (λ(c, r) → abs(x - c) == abs(y - r)) $
zip (reverse cs) [1..]

Observing that the algorithm is tail recursive, it’s easy to transform it into imperative
realization. Instead of using list, we use array to represent queens assignment. Denote the
stack as S, and the possible solutions as A. The imperative algorithm can be described
as the following.
1: function Solve-Queens
2: S ← {ϕ}
3: L←ϕ ▷ The result list
4: while S 6= ϕ do
5: A ← Pop(S) ▷ A is an intermediate assignment
6: if |A| = 8 then
7: Add(L, A)
8: else
9: for i ← 1 to 8 do
10: if Valid(i, A) then
11: Push(S, A ∪ {i})
12: return L
The stack is initialized with the empty assignment. The main process repeatedly pops
the top candidate from the stack. If there are still queens left, the algorithm examines
possible columns in the next row from 1 to 8. If a column is safe, that it won’t be attacked
by any previous queens, this column will be appended to the assignment, and pushed back
to the stack. Different from the functional approach, since array, but not list, is used, we
needn’t reverse the solution assignment any more.
Function Valid checks if column x is safe with previous queens put in A. It filters out
the columns have already been occupied, and calculates if any diagonal lines are formed
with existing queens.
1: function Valid(x, A)
2: y ← 1 + |A|
3: for i ← 1 to |A| do
4: if x = A[i] ∨ |y − i| = |x − A[i]| then
5: return False
6: return True
The following Python example program implements this imperative algorithm.
def solve():
stack = [[]]
s = []
while stack ̸= []:
a = [Link]()
if len(a) == 8:
[Link](a)
14.3. SOLUTION SEARCHING 419

else:
for i in range(1, 9):
if valid(i, a):
[Link](a+[i])
return s

def valid(x, a):

y = len(a) + 1
for i in range(1, y):
if x == a[i-1] or abs(y - i) == abs(x - a[i-1]):
return False
return True

Although there are 8 optional columns for each queen, not all of them are valid and
thus further expanded. Only those columns haven’t been occupied by previous queens
are tried. The algorithm only examines 15720, which is far less than 88 = 16777216,
possibilities [89].
It’s quite easy to extend the algorithm, so that it can solve n queens puzzle, where
n ≥ 4. However, the time cost increases fast. The backtrack algorithm is just slightly
better than the one permuting the sequence of 1 to 8 (which is bound to o(n!)). Another
extension to the algorithm is based on the fact that the chess board is square, which is
symmetric both vertically and horizontally. Thus a solution can generate other solutions
by rotating and flipping. These aspects are left as exercises to the reader.

Peg puzzle
I once received a puzzle of the leap frogs. It said to be homework for 2nd grade student
in China. As illustrated in figure 14.30, there are 6 frogs in 7 stones. Each frog can either
hop to the next stone if it is not occupied, or leap over one frog to another empty stone.
The frogs on the left side can only move to the right, while the ones on the right side can
only move to the left. These rules are described in figure 14.31

Figure 14.30: The leap frogs puzzle.

The goal of this puzzle is to arrange the frogs to jump according to the rules, so that
the positions of the 3 frogs on the left are finally exchange with the ones on the right. If
we denote the frog on the left as ’A’, on the right as ’B’, and the empty stone as ’O’. The
puzzle is to find a solution to transform from ’AAAOBBB’ to ’BBBOAAA’.

(a) Jump to the next (b) Jump over to the (c) Jump over to the
stone right left

Figure 14.31: Moving rules.

420 CHAPTER 14. SEARCHING

This puzzle is just a special form of the peg puzzles. The number of pegs is not limited
to 6. it can be 8 or other bigger even numbers. Figure 14.32 shows some variants.

(a) Solitaire (b) Hop over (c) Draught board

Figure 14.32: Variants of the peg puzzles from [Link] stegman-

n/[Link]

We can solve this puzzle by programing. The idea is similar to the 8 queens puzzle.
Denote the positions from the left most stone as 1, 2, ..., 7. In ideal cases, there are 4
options to arrange the move. For example when start, the frog on 3rd stone can hop right
to the empty stone; symmetrically, the frog on the 5th stone can hop left; Alternatively,
the frog on the 2nd stone can leap right, while the frog on the 6th stone can leap left.
We can record the state and try one of these 4 options at every step. Of course not all
of them are possible at any time. If get stuck, we can backtrack and try other options.
As we restrict the left side frogs only moving to the right, and the right frogs only
moving to the left, the moves are not reversible. There won’t be any repetition cases as
what we have to deal with in the maze puzzle. However, we still need record the steps in
order to print them out finally.
In order to enforce these restriction, let A, O, B in representation ’AAAOBBB’ be -1,
0, and 1 respectively. A state L is a list of elements, each element is one of these 3 values.
It starts from {−1, −1, −1, 0, 1, 1, 1}. L[i] access the i-th element, its value indicates if
the i-th stone is empty, occupied by a frog from left side, or occupied by a frog from right
side. Denote the position of the vacant stone as p. The 4 moving options can be stated
as below.

• Leap left: p < 6 and L[p + 2] > 0, swap L[p] ↔ L[p + 2];

• Hop left: p < 7 and L[p + 1] > 0, swap L[p] ↔ L[p + 1];

• Leap right: p > 2 and L[p − 2] < 0, swap L[p − 2] ↔ L[p];

• Hop right: p > 1 and L[p − 1] < 0, swap L[p − 1] ↔ L[p].

Four functions leapl (L), hopl (L), leapr (L) and hopr (L) are defined accordingly. If
the state L does not satisfy the move restriction, these function return L unchanged,
otherwise, the changed state L′ is returned accordingly.
We can also explicitly maintain a stack S to the attempts as well as the historic
movements. The stack is initialized with a singleton list of starting state. The solution is
accumulated to a list M , which is empty at the beginning:

solve({{−1, −1, −1, 0, 1, 1, 1}}, ϕ) (14.57)

As far as the stack isn’t empty, we pop one intermediate attempt. If the latest state
is equal to {1, 1, 1, 0, −1, −1, −1}, a solution is found. We append the series of moves till
this state to the result list M ; otherwise, We expand to next possible state by trying all
four possible moves, and push them back to the stack for further search. Denote the top
14.3. SOLUTION SEARCHING 421

element in the stack S as s1 , and the latest state in s1 as L. The algorithm can be defined
as the following.

 M
: S=ϕ
solve(S, M ) = solve(S ′ , {reverse(s1 )} ∪ M )
: L = {1, 1, 1, 0, −1, −1, −1}

solve(P ∪ S ′ , M )
: otherwise
(14.58)
Where P are possible moves from the latest state L:

P = {L′ |L′ ∈ {leapl (L), hopl (L), leapr (L), hopr (L)}, L 6= L′ }

Note that the starting state is stored as the last element, while the final state is the
first. That is the reason why we reverse it when adding to solution list.
Translating this algorithm to Haskell gives the following example program.
solve = dfsSolve [[[-1, -1, -1, 0, 1, 1, 1]]] [] where
dfsSolve [] s = s
dfsSolve (c:cs) s
| head c == [1, 1, 1, 0, -1, -1, -1] = dfsSolve cs (reverse c:s)
| otherwise = dfsSolve ((map (:c) $ moves $ head c) ++ cs) s

moves s = filter (/=s) [leapLeft s, hopLeft s, leapRight s, hopRight s] where

leapLeft [] = []
leapLeft (0:y:1:ys) = 1:y:0:ys
leapLeft (y:ys) = y:leapLeft ys
hopLeft [] = []
hopLeft ([Link]ys) = [Link]ys
hopLeft (y:ys) = y:hopLeft ys
leapRight [] = []
leapRight (-1:y:0:ys) = 0:y:(-1):ys
leapRight (y:ys) = y:leapRight ys
hopRight [] = []
hopRight (-[Link]ys) = 0:(-1):ys
hopRight (y:ys) = y:hopRight ys

Running this program finds 2 symmetric solutions, each takes 15 steps. One solution
is list in the below table.
step -1 -1 -1 0 1 1 1
1 -1 -1 0 -1 1 1 1
2 -1 -1 1 -1 0 1 1
3 -1 -1 1 -1 1 0 1
4 -1 -1 1 0 1 -1 1
5 -1 0 1 -1 1 -1 1
6 0 -1 1 -1 1 -1 1
7 1 -1 0 -1 1 -1 1
8 1 -1 1 -1 0 -1 1
9 1 -1 1 -1 1 -1 0
10 1 -1 1 -1 1 0 -1
11 1 -1 1 0 1 -1 -1
12 1 0 1 -1 1 -1 -1
13 1 1 0 -1 1 -1 -1
14 1 1 1 -1 0 -1 -1
15 1 1 1 0 -1 -1 -1
Observe that the algorithm is in tail recursive manner, it can also be realized imper-
atively. The algorithm can be more generalized, so that it solve the puzzles of n frogs on
each side. We represent the start state {-1, -1, ..., -1, 0, 1, 1, ..., 1} as s, and the mirrored
end state as e.
1: function Solve(s, e)
422 CHAPTER 14. SEARCHING

2: S ← {{s}}
3: M ←ϕ
4: while S 6= ϕ do
5: s1 ← Pop(S)
6: if s1 [1] = e then
7: Add(M , Reverse(s1 ))
8: else
9: for ∀m ∈ Moves(s1 [1]) do
10: Push(S, {m} ∪ s1 )
11: return M
The possible moves can be also generalized with procedure Moves to handle arbitrary
number of frogs. The following Python program implements this solution.
def solve(start, end):
stack = [[start]]
s = []
while stack ̸= []:
c = [Link]()
if c[0] == end:
[Link](reversed(c))
else:
for m in moves(c[0]):
[Link]([m]+c)
return s

def moves(s):
ms = []
n = len(s)
p = [Link](0)
if p < n - 2 and s[p+2] > 0:
[Link](swap(s, p, p+2))
if p < n - 1 and s[p+1] > 0:
[Link](swap(s, p, p+1))
if p > 1 and s[p-2] < 0:
[Link](swap(s, p, p-2))
if p > 0 and s[p-1] < 0:
[Link](swap(s, p, p-1))
return ms

def swap(s, i, j):

a = s[:]
(a[i], a[j]) = (a[j], a[i])
return a

For 3 frogs in each side, we know that it takes 15 steps to exchange them. It’s
interesting to examine the table that how many steps are needed along with the number
of frogs in each side. Our program gives the following result.
number of frogs 1 2 3 4 5 ...
number of steps 3 8 15 24 35 ...
It seems that the number of steps are all square numbers minus one. It’s natural to
guess that the number of steps for n frogs in one side is (n + 1)2 − 1. Actually we can
prove it is true.
Compare to the final state and the start state, each frog moves ahead n + 1 stones
in its opposite direction. Thus total 2n frogs move 2n(n + 1) stones. Another important
fact is that each frog on the left has to meet every one on the right one time. And leap
will happen when meets. Since the frog moves two stones ahead by leap, and there are
total n2 meets happen, so that all these meets cause moving 2n2 stones ahead. The rest
moves are not leap, but hop. The number of hops are 2n(n + 1) − 2n2 = 2n. Sum up all
n2 leaps and 2n hops, the total number of steps are n2 + 2n = (n + 1)2 − 1.
14.3. SOLUTION SEARCHING 423

Summary of DFS
Observe the above three puzzles, although they vary in many aspects, their solutions
show quite similar common structures. They all have some starting state. The maze
starts from the entrance point; The 8 queens puzzle starts from the empty board; The
leap frogs start from the state of ’AAAOBBB’. The solution is a kind of searching, at each
attempt, there are several possible ways. For the maze puzzle, there are four different
directions to try; For the 8 queens puzzle, there are eight columns to choose; For the leap
frogs puzzle, there are four movements of leap or hop. We don’t know how far we can go
when make a decision, although the final state is clear. For the maze, it’s the exit point;
For the 8 queens puzzle, we are done when all the 8 queens being assigned on the board;
For the leap frogs puzzle, the final state is that all frogs exchanged.
We use a common approach to solve them. We repeatedly select one possible candidate
to try, record where we’ve achieved; If we get stuck, we backtrack and try other options.
We are sure by using this strategy, we can either find a solution, or tell that the problem
is unsolvable.
Of course there can be some variation, that we can stop when find one answer, or go
on searching all the solutions.
If we draw a tree rooted at the starting state, expand it so that every branch stands
for a different attempt, our searching process is in a manner, that it searches deeper and
deeper. We won’t consider any other options in the same depth unless the searching fails
so that we’ve to backtrack to upper level of the tree. Figure 14.33 illustrates the order we
search a state tree. The arrow indicates how we go down and backtrack up. The number
of the nodes shows the order we visit them.

Figure 14.33: Example of DFS search order.

This kind of search strategy is called ’DFS’ (Deep-first-search). We widely use it

unintentionally. Some programming environments, Prolog for instance, adopt DFS as the
default evaluation model. A maze is given by a set of rule base, such as:
c(a, b). c(a, e).
c(b, c). c(b, f).
c(e, d), c(e, f).
c(f, c).
c(g, d). c(g, h).
c(h, f).

Where predicate c(X, Y ) means place X is connected with Y . Note that this is a
directed predicate, we can make Y to be connected with X as well by either adding a
symmetric rule, or create a undirected predicate. Figure 14.34 shows such a directed
graph. Given two places X and Y , Prolog can tell if they are connected by the following
program.
424 CHAPTER 14. SEARCHING

a g

b e h

f d

Figure 14.34: A directed graph.

go(X, X).
go(X, Y) :- c(X, Z), go(Z, Y)

This program says that, a place is connected with itself. Given two different places X
and Y , if X is connected with Z, and Z is connected with Y , then X is connected with
Y . Note that, there might be multiple choices for Z. Prolog selects a candidate, and go
on further searching. It only tries other candidates if the recursive searching fails. In that
case, Prolog backtracks and tries other alternatives. This is exactly what DFS does.
DFS is quite straightforward when we only need a solution, but don’t care if the
solution takes the fewest steps. For example, the solution it gives, may not be the shortest
path for the maze. We’ll see some more puzzles next. They demands to find the solution
with the minimum attempts.

The wolf, goat, and cabbage puzzle

This puzzle says that a farmer wants to cross a river with a wolf, a goat, and a bucket
of cabbage. There is a boat. Only the farmer can drive it. But the boat is small. it
can only hold one of the wolf, the goat, and the bucket of cabbage with the farmer at a
time. The farmer has to pick them one by one to the other side of the river. However,
the wolf would eat the goat, and the goat would eat the cabbage if the farmer is absent.
The puzzle asks to find the fast solution so that they can all safely go cross the river.
The key point to this puzzle is that the wolf does not eat the cabbage. The farmer can
safely pick the goat to the other side. But next time, no matter if he pick the wolf or the
cabbage to cross the river, he has to take one back to avoid the conflict. In order to find
the fast the solution, at any time, if the farmer has multiple options, we can examine all of
them in parallel, so that these different decisions compete. If we count the number of the
times the farmer cross the river without considering the direction, that crossing the river
back and forth means 2 times, we are actually checking the complete possibilities after
1 time, 2 times, 3 times, ... When we find a situation, that they all arrive at the other
bank, we are done. And this solution wins the competition, which is the fast solution.
The problem is that we can’t examine all the possible solutions in parallel ideally.
Even with a super computer equipped with many CPU cores, the setup is too expensive
to solve such a simple puzzle.
Let’s consider a lucky draw game. People blindly pick from a box with colored balls.
There is only one black ball, all the others are white. The one who pick the black ball wins
the game; Otherwise, he must return the ball to the box and wait for the next chance. In
order to be fair enough, we can setup a rule that no one can try the second time before
all others have tried. We can line people to a queue. Every time the first guy pick a ball,
14.3. SOLUTION SEARCHING 425

Figure 14.35: The wolf, goat, cabbage puzzle

if he does not win, he then stands at the tail of the queue to wait for the second try. This
queue helps to ensure our rule.
We can use the quite same idea to solve our puzzle. The two banks of the river can
be represented as two sets A and B. A contains the wolf, the goat, the cabbage and the
farmer; while B is empty. We take an element along with the farmer from one set to the
other each time. The two sets can’t hold conflict things if the farmer is absent. The goal
is to exchange the contents of A and B with fewest steps.
We initialize a queue with state A = {w, g, c, p}, B = ϕ as the only element. As far as
the queue isn’t empty, we pick the first element from the head, expand it with all possible
options, and put these new expanded candidates to the tail of the queue. If the first
element on the head is the final goal, that A = ϕ, B = {w, g, c, p}, we are done. Figure
14.37 illustrates the idea of this search order. Note that as all possibilities in the same
level are examined, there is no need for back-tracking.
There is a simple way to treat the set. A four bits binary number can be used, each
bit stands for a thing, for example, the wolf w = 1, the goat g = 2, the cabbage c = 4,
and the farmer p = 8. That 0 stands for the empty set, 15 stands for a full set. Value
3, solely means there are a wolf and a goat on the river bank. In this case, the wolf will
eat the goat. Similarly, value 6 stands for another conflicting case. Every time, we move
the highest bit (which is 8), or together with one of the other bits (4 or 2, or 1) from one
number to the other. The possible moves can be defined as below.

{
{(A − 8 − i, B + 8 + i)|i ∈ {0, 1, 2, 4}, i = 0 ∨ A∧i 6= 0} : B<8
mv(A, B) =
{(A + 8 + i, B − 8 − i)|i ∈ {0, 1, 2, 4}, i = 0 ∨ B∧i 6= 0} : Otherwise
(14.59)
Where ∧ is the bitwise-and operation.
the solution can be given by reusing the queue defined in previous chapter. Denote
the queue as Q, which is initialed with a singleton list {(15, 0)}. If Q is not empty,
function DeQ(Q) extracts the head element M , the updated queue becomes Q′ . M is a
list of pairs, stands for a series of movements between the river banks. The first element
in m1 = (A′ , B ′ ) is the latest state. Function EnQ′ (Q, L) is a slightly different enqueue
operation. It pushes all the possible moving sequences in L to the tail of the queue one
by one and returns the updated queue. With these notations, the solution function is
426 CHAPTER 14. SEARCHING

Figure 14.36: A lucky-draw game, the i-th person goes from the queue, pick a ball, then
join the queue at tail if he fails to pick the black ball.

Figure 14.37: Start from state 1, check all possible options 2, 3, and 4 for next step; then
all nodes in level 3, ...
14.3. SOLUTION SEARCHING 427

defined like below.



 ϕ : Q=ϕ
 ′
{ reverse(M
} ) : A =0
solve(Q) =

 {m} ∪ M | m ∈ mv(m1 ),
 solve(EnQ′ (Q′ , )) : otherwise
valid(m, M )
(14.60)
Where function valid(m, M ) checks if the new moving candidate m = (A′′ , B ′′ ) is
valid. That neither A′′ nor B ′′ is 3 or 6, and m hasn’t been tried before in M to avoid
any repeatedly attempts.

valid(m, M ) = A′′ 6= 3, A′′ 6= 6, B ′′ =

6 3, B ′′ 6= 6, m ∈
/M (14.61)

The following example Haskell program implements this solution. Note that it uses a
plain list to represent the queue for illustration purpose.
import [Link]

solve = bfsSolve [[(15, 0)]] where

bfsSolve :: [[(Int, Int)]] → [(Int, Int)]
bfsSolve [] = [] −− no solution
bfsSolve (c:cs) | (fst $ head c) == 0 = reverse c
| otherwise = bfsSolve (cs ++ map (:c)
(filter (`valid` c) $ moves $ head c))
valid (a, b) r = not $ or [ a èlem` [3, 6], b èlem` [3, 6],
(a, b) èlem` r]

moves (a, b) = if b < 8 then trans a b else map swap (trans b a) where
trans x y = [(x - 8 - i, y + 8 + i)
| i ←[0, 1, 2, 4], i == 0 | | (x .&. i) /= 0]
swap (x, y) = (y, x)

This algorithm can be easily modified to find all the possible solutions, but not just
stop after finding the first one. This is left as the exercise to the reader. The following
shows the two best solutions to this puzzle.
Solution 1:
Left river Right
wolf, goat, cabbage, farmer
wolf, cabbage goat, farmer
wolf, cabbage, farmer goat
cabbage wolf, goat, farmer
goat, cabbage, farmer wolf
goat wolf, cabbage, farmer
goat, farmer wolf, cabbage
wolf, goat, cabbage, farmer
Solution 2:
Left river Right
wolf, goat, cabbage, farmer
wolf, cabbage goat, farmer
wolf, cabbage, farmer goat
wolf goat, cabbage, farmer
wolf, goat, farmer cabbage
goat wolf, cabbage, farmer
goat, farmer wolf, cabbage
wolf, goat, cabbage, farmer
This algorithm can also be realized imperatively. Observing that our solution is in tail
recursive manner, we can translate it directly to a loop. We use a list S to hold all the
428 CHAPTER 14. SEARCHING

solutions can be found. The singleton list {(15, 0)} is pushed to queue when initializing.
As long as the queue isn’t empty, we extract the head C from the queue by calling DeQ
procedure. Examine if it reaches the final goal, if not, we expand all the possible moves
and push to the tail of the queue for further searching.
1: function Solve
2: S←ϕ
3: Q←ϕ
4: EnQ(Q, {(15, 0)})
5: while Q 6= ϕ do
6: C ← DeQ(Q)
7: if c1 = (0, 15) then
8: Add(S, Reverse(C))
9: else
10: for ∀m ∈ Moves(C) do
11: if Valid(m, C) then
12: EnQ(Q, {m} ∪ C)
13: return S
Where Moves, and Valid procedures are as same as before. The following Python
example program implements this imperative algorithm.

def solve():
s = []
queue = [[(0xf, 0)]]
while queue ̸= []:
cur = [Link](0)
if cur[0] == (0, 0xf):
[Link](reverse(cur))
else:
for m in moves(cur):
[Link]([m]+cur)
return s

def moves(s):
(a, b) = s[0]
return valid(s, trans(a, b) if b < 8 else swaps(trans(b, a)))

def valid(s, mv):

return [(a, b) for (a, b) in mv
if a not in [3, 6] and b not in [3, 6] and (a, b) not in s]

def trans(a, b):

masks = [ 8 | (1<<i) for i in range(4)]
return [(a ^ mask, b | mask) for mask in masks if a & mask == mask]

def swaps(s):
return [(b, a) for (a, b) in s]

There is a minor difference between the program and the pseudo code, that the func-
tion to generate candidate moving options filters the invalid cases inside it.
Every time, no matter the farmer drives the boat back and forth, there are m options
for him to choose, where m is the number of objects on the river bank the farmer drives
from. m is always less than 4, that the algorithm won’t take more than n4 times at step
n. This estimation is far more than the actual time, because we avoid trying all invalid
cases. Our solution examines all the possible moving in the worst case. Because we check
recorded steps to avoid repeated attempt, the algorithm takes about O(n2 ) time to search
for n possible steps.
14.3. SOLUTION SEARCHING 429

Water jugs puzzle

This is a popular puzzle in classic AI. The history of it should be very long. It says that
there are two jugs, one is 9 quarts, the other is 4 quarts. How to use them to bring up
from the river exactly 6 quarts of water?
There are varies versions of this puzzle, although the volume of the jugs, and the
target volume of water differ. The solver is said to be young Blaise Pascal when he was
a child, the French mathematician, scientist in one story, and Simèon Denis Poisson in
another story. Later in the popular Hollywood movie ‘Die-Hard 3’, actor Bruce Willis
and Samuel L. Jackson were also confronted with this puzzle.
Pòlya gave a nice way to solve this problem backwards in [90].

Figure 14.38: Two jugs with volume of 9 and 4.

Instead of thinking from the starting state as shown in figure 14.38. Pòlya pointed out
that there will be 6 quarts of water in the bigger jugs at the final stage, which indicates
the second last step, we can fill the 9 quarts jug, then pour out 3 quarts from it. In order
to achieve this, there should be 1 quart of water left in the smaller jug as shown in figure
14.39.

Figure 14.39: The last two steps.

It’s easy to see that fill the 9 quarters jug, then pour to the 4 quarters jug twice
can bring 1 quarters of water. As shown in figure 14.40. At this stage, we’ve found
the solution. By reversing our findings, we can give the correct steps to bring exactly 6
quarters of water.
Pòlya’s methodology is general. It’s still hard to solve it without concrete algorithm.
For instance, how to bring up 2 gallons of water from 899 and 1147 gallon jugs?
There are 6 ways to deal with 2 jugs in total. Denote the smaller jug as A, the bigger
jug as B.
430 CHAPTER 14. SEARCHING

Figure 14.40: Fill the bigger jugs, and pour to the smaller one twice.

• Fill jug A from the river;

• Fill jug B from the river;

• Empty jug A;

• Empty jug B;

• Pour water from jug A to B;

• Pour water from jug B to A.

The following sequence shows an example. Note that in this example, we assume that
a < b < 2a.
A B operation
0 0 start
a 0 fill A
0 a pour A into B
a a fill A
2a - b b pour A into B
2a - b 0 empty B
0 2a - b pour A into B
a 2a - b fill A
3a - 2b b pour A into B
... ... ...
No matter what the above operations are taken, the amount of water in each jug can
be expressed as xa + yb, where a and b are volumes of jugs, for some integers x and y. All
the amounts of water we can get are linear combination of a and b. We can immediately
tell given two jugs, if a goal g is solvable or not.
For instance, we can’t bring 5 gallons of water with two jugs of volume 4 and 6 gallon.
The number theory ensures that, the 2 water jugs puzzle can be solved if and only if g
can be divided by the greatest common divisor of a and b. Written as:

gcd(a, b)|g (14.62)

Where m|n means n can be divided by m. What’s more, if a and b are relatively
prime, which means gcd(a, b) = 1, it’s possible to bring up any quantity g of water.
Although gcd(a, b) enables us to determine if the puzzle is solvable, it doesn’t give us
the detailed pour sequence. If we can find some integer x and y, so that g = xa + yb. We
can arrange a sequence of operations (even it may not be the best solution) to solve it.
14.3. SOLUTION SEARCHING 431

The idea is that, without loss of generality, suppose x > 0, y < 0, we need fill jug A by x
times, and empty jug B by y times in total.
Let’s take a = 3, b = 5, and g = 4 for example, since 4 = 3 × 3 − 5, we can arrange a
sequence like the following.
A B operation
0 0 start
3 0 fill A
0 3 pour A into B
3 3 fill A
1 5 pour A into B
1 0 empty B
0 1 pour A into B
3 1 fill A
0 4 pour A into B
In this sequence, we fill A by 3 times, and empty B by 1 time. The procedure can be
described as the following:
Repeat x times:

1. Fill jug A;

2. Pour jug A into jug B, whenever B is full, empty it.

So the only problem left is to find the x and y. There is a powerful tool in number
theory called, Extended Euclid algorithm, which can achieve this. Compare to the classic
Euclid GCD algorithm, which can only give the greatest common divisor, The extended
Euclid algorithm can give a pair of x, y as well, so that:

(d, x, y) = gcdext (a, b) (14.63)

where d = gcd(a, b) and ax + by = d. Without loss of generality, suppose a < b, there

exits quotation q and remainder r that:

b = aq + r (14.64)

Since d is the common divisor, it can divide both a and b, thus d can divide r as well.
Because r is less than a, we can scale down the problem by finding GCD of a and r:

(d, x′ , y ′ ) = gcdext (r, a) (14.65)

Where d = x′ r + y ′ a according to the definition of the extended Euclid algorithm.

Transform b = aq + r to r = b − aq, substitute r in above equation yields:

d = x′ (b − aq) + y ′ a
(14.66)
= (y ′ − x′ q)a + x′ b

This is the linear combination of a and b, so that we have:

{
b
x = y ′ − x′
a (14.67)
y = x′

Note that this is a typical recursive relationship. The edge case happens when a = 0.

gcd(0, b) = b = 0a + 1b (14.68)
432 CHAPTER 14. SEARCHING

Summarize the above result, the extended Euclid algorithm can be defined as the
following:
{
(b, 0, 1) : a=0
gcdext (a, b) = b (14.69)
(d, y − x′ , x′ ) :
′
otherwise
a
Where d, x′ , y ′ are defined in equation (14.65).
The 2 water jugs puzzle is almost solved, but there are still two detailed problems
need to be tackled. First, extended Euclid algorithm gives the linear combination for the
greatest common divisor d. While the target volume of water g isn’t necessarily equal to
d. This can be easily solved by multiplying x and y by m times, where m = g/gcd(a, b);
Second, we assume x > 0, to form a procedure to fill jug A with x times. However, the
extended Euclid algorithm doesn’t ensure x to be positive. For instance gcdext (4, 9) =
(1, −2, 1). Whenever we get a negative x, since d = xa + yb, we can continuously add b
to x, and decrease y by a till x is greater than zero.
At this stage, we are able to give the complete solution to the 2 water jugs puzzle.
Below is an example Haskell program.
extGcd 0 b = (b, 0, 1)
extGcd a b = let (d, x', y') = extGcd (b `mod` a) a in
(d, y' - x' ∗ (b `div` a), x')

solve a b g | g `mod` d /= 0 = [] −− no solution

| otherwise = solve' (x ∗ g `div` d)
where
(d, x, y) = extGcd a b
solve' x | x < 0 = solve' (x + b)
| otherwise = pour x [(0, 0)]
pour 0 ps = reverse ((0, g):ps)
pour x ps@((a', b'):_) | a' == 0 = pour (x - 1) ((a, b'):ps) −− fill a
| b' == b = pour x ((a', 0):ps) −− empty b
| otherwise = pour x ((max 0 (a' + b' - b),
min (a' + b') b):ps)

Although we can solve the 2 water jugs puzzle with extended Euclid algorithm, the
solution may not be the best. For instance, when we are going to bring 4 gallons of
water from 3 and 5 gallons jugs. The extended Euclid algorithm produces the following
sequence:

[(0,0),(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),
(0,4),(3,4),(2,5),(2,0),(0,2),(3,2),(0,5),(3,5),
(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),(0,4)]

It takes 23 steps to achieve the goal, while the best solution only need 6 steps:

[(0,0),(0,5),(3,2),(0,2),(2,0),(2,5),(3,4)]

Observe the 23 steps, and we can find that jug B has already contained 4 gallons of
water at the 8-th step. But the algorithm ignores this fact and goes on executing the left
15 steps. The reason is that the linear combination x and y we find with the extended
Euclid algorithm are not the only numbers satisfying g = xa + by. For all these numbers,
the smaller |x| + |y|, the less steps are needed. There is an exercise to addressing this
problem in this section.
The interesting problem is how to find the best solution? We have two approaches,
one is to find x and y to minimize |x| + |y|; the other is to adopt the quite similar idea as
the wolf-goat-cabbage puzzle. We focus on the latter in this section. Since there are at
14.3. SOLUTION SEARCHING 433

most 6 possible options: fill A, fill B, pour A into B, pour B into A, empty A and empty
B, we can try them in parallel, and check which decision can lead to the best solution.
We need record all the states we’ve achieved to avoid any potential repetition. In order to
realize this parallel approach with reasonable resources, a queue can be used to arrange
our attempts. The elements stored in this queue are series of pairs (p, q), where p and q
represent the volume of waters contained in each jug. These pairs record the sequence of
our operations from the beginning to the latest. We initialize the queue with the singleton
list contains the starting state {(0, 0)}.

solve(a, b, g) = solve′ {{(0, 0)}} (14.70)

Every time, when the queue isn’t empty, we pick a sequence from the head of the
queue. If this sequence ends with a pair contains the target volume g, we find a solution,
we can print this sequence by reversing it; Otherwise, we expand the latest pair by trying
all the possible 6 options, remove any duplicated states, and add them to the tail of the
queue. Denote the queue as Q, the first sequence stored on the head of the queue as S,
the latest pair in S as (p, q), and the rest of pairs as S ′ . After popping the head element,
the queue becomes Q′ . This algorithm can be defined like below:

 ϕ : Q=ϕ
solve′ (Q) = reverse(S) : p = g ∨ q = g (14.71)

solve′ (EnQ′ (Q′ , {{s′ } ∪ S ′ |s′ ∈ try(S)})) : otherwise

Where function EnQ′ pushes a list of sequence to the queue one by one. Function
try(S) will try all possible 6 options to generate new pairs of water volumes:
 
 f illA(p, q), f illB(p, q), 
try(S) = {s′ |s′ ∈ pourA(p, q), pourB(p, q), , s′ ∈
/ S′} (14.72)
 
emptyA(p, q), emptyB(p, q)

It’s intuitive to define the 6 options. For fill operations, the result is that the volume of
the filled jug is full; for empty operation, the result volume is empty; for pour operation,
we need test if the jug is big enough to hold all the water.

f illA(p, q) = (a, q) f illB(p, q) = (p, b)

emptyA(p, q) = (0, q) emptyB(p, q) = (p, 0)
(14.73)
pourA(p, q) = (max(0, p + q − b), min(x + y, b))
pourB(p, q) = (min(x + y, a), max(0, x + y − a))

The following example Haskell program implements this method:

solve' a b g = bfs [[(0, 0)]] where
bfs [] = []
bfs (c:cs) | fst (head c) == g | | snd (head c) == g = reverse c
| otherwise = bfs (cs ++ map (:c) (expand c))
expand ((x, y):ps) = filter (`notElem` ps) $ map (λf → f x y)
[fillA, fillB, pourA, pourB, emptyA, emptyB]
fillA _ y = (a, y)
fillB x _ = (x, b)
emptyA _ y = (0, y)
emptyB x _ = (x, 0)
pourA x y = (max 0 (x + y - b), min (x + y) b)
pourB x y = (min (x + y) a, max 0 (x + y - a))

This method always returns the fast solution. It can also be realized in imperative
approach. Instead of storing the complete sequence of operations in every element in the
queue, we can store the unique state in a global history list, and use links to track the
operation sequence, this can save spaces.
434 CHAPTER 14. SEARCHING

(0, 0)
(3, 0)
(0, 5)
(0, 0) (3, 5)
(0, 3)
(3, 2)
...

fill A flll B

(3, 0) (0, 5)

fill B empty A pour A fill A empty B pour B

(3, 5) (0, 0) (0, 3) (3, 5) (0, 0) (3, 2)

...

Figure 14.41: All attempted states are stored in a global list.

The idea is illustrated in figure 14.41. The initial state is (0, 0). Only ‘fill A’ and ‘fill
B’ are possible. They are tried and added to the record list; Next we can try and record
‘fill B’ on top of (3, 0), which yields new state (3, 5). However, when try ‘empty A’ from
state (3, 0), we would return to the start state (0, 0). As this previous state has been
recorded, it is ignored. All the repeated states are in gray color in this figure.
With such settings, we needn’t remember the operation sequence in each element in
the queue explicitly. We can add a ‘parent’ link to each node in figure 14.41, and use it to
back-traverse to the starting point from any state. The following example ANSI C code
shows such a definition.
struct Step {
int p, q;
struct Step∗ parent;
};

struct Step∗ make_step(int p, int q, struct Step∗ parent) {

struct Step∗ s = (struct Step∗) malloc(sizeof(struct Step));
s→p = p;
s→q = q;
s→parent = parent;
return s;
}

Where p, q are volumes of water in the 2 jugs. For any state s, define functions p(s)
and q(s) return these 2 values, the imperative algorithm can be realized based on this
idea as below.
1: function Solve(a, b, g)
2: Q←ϕ
3: Push-and-record(Q, (0, 0))
4: while Q 6= ϕ do
5: s ← Pop(Q)
6: if p(s) = g ∨ q(s) = g then
7: return s
8: else
14.3. SOLUTION SEARCHING 435

9: C ← Expand(s)
10: for ∀c ∈ C do
11: if c 6= s ∧ ¬ Visited(c) then
12: Push-and-record(Q, c)
13: return NIL
Where Push-and-record does not only push an element to the queue, but also
record this element as visited, so that we can check if an element has been visited before
in the future. This can be implemented with a list. All push operations append the new
elements to the tail. For pop operation, instead of removing the element pointed by head,
the head pointer only advances to the next one. This list contains historic data which
has to be reset explicitly. The following ANSI C code illustrates this idea.
struct Step ∗steps[1000], ∗∗head, ∗∗tail = steps;

void push(struct Step∗ s) { ∗tail++ = s; }

struct Step∗ pop() { return ∗head++; }

int empty() { return head == tail; }

void reset() {
struct Step ∗∗p;
for (p = steps; p ̸= tail; ++p)
free(∗p);
head = tail = steps;
}

In order to test a state has been visited, we can traverse the list to compare p and q.
int eq(struct Step∗ a, struct Step∗ b) {
return a→p == b→p && a→q == b→q;
}

int visited(struct Step∗ s) {

struct Step ∗∗p;
for (p = steps; p ̸= tail; ++p)
if (eq(∗p, s)) return 1;
return 0;
}

The main program can be implemented as below:

struct Step∗ solve(int a, int b, int g) {
int i;
struct Step ∗cur, ∗cs[6];
reset();
push(make_step(0, 0, NULL));
while (!empty()) {
cur = pop();
if (cur→p == g | | cur→q == g)
return cur;
else {
expand(cur, a, b, cs);
for (i = 0; i < 6; ++i)
if(!eq(cur, cs[i]) && !visited(cs[i]))
push(cs[i]);
}
}
return NULL;
}

Where function expand tries all the 6 possible options:

436 CHAPTER 14. SEARCHING

void expand(struct Step∗ s, int a, int b, struct Step∗∗ cs) {

int p = s→p, q = s→q;
cs[0] = make_step(a, q, s); /∗fill A∗/
cs[1] = make_step(p, b, s); /∗fill B∗/
cs[2] = make_step(0, q, s); /∗empty A∗/
cs[3] = make_step(p, 0, s); /∗empty B∗/
cs[4] = make_step(max(0, p + q - b), min(p + q, b), s); /∗pour A∗/
cs[5] = make_step(min(p + q, a), max(0, p + q - a), s); /∗pour B∗/
}

And the result steps is back tracked in reversed order, it can be output with a recursive
function:
void print(struct Step∗ s) {
if (s) {
print(s→parent);
printf("%d, %dλn", s→p, s→q);
}
}

Kloski

Kloski is a block sliding puzzle. It appears in many countries. There are different sizes
and layouts. Figure 14.42 illustrates a traditional Kloski game in China.

(a) Initial layout of blocks (b) Block layout after sev-

eral movements

Figure 14.42: ‘Huarong Dao’, the traditional Kloski game in China.

In this puzzle, there are 10 blocks, each is labeled with text or icon. The smallest
block has size of 1 unit square, the biggest one is 2 × 2 units size. Note there is a slot of
2 units wide at the middle-bottom of the board. The biggest block represents a king in
ancient time, while the others are enemies. The goal is to move the biggest block to the
slot, so that the king can escape. This game is named as ‘Huarong Dao’, or ‘Huarong
Escape’ in China. Figure 14.43 shows the similar Kloski puzzle in Japan. The biggest
block means daughter, while the others are her family members. This game is named as
‘Daughter in the box’ in Japan (Japanese name: hakoiri musume).
In this section, we want to find a solution, which can slide blocks from the initial state
to the final state with the minimum movements.
The intuitive idea to model this puzzle is to use a 5 × 4 matrix representing the board.
All pieces are labeled with a number. The following matrix M , for example, shows the
initial state of the puzzle.
14.3. SOLUTION SEARCHING 437

Figure 14.43: ‘Daughter in the box’, the Kloski game in Japan.

 
1 10 10 2
 1 10 10 2 
 
M =
 3 4 4 5 

 3 7 8 5 
6 0 0 9

In this matrix, the cells of value i mean the i-th piece covers this cell. The special
value 0 represents a free cell. By using sequence 1, 2, ... to identify pieces, a special
layout can be further simplified as an array L. Each element is a list of cells covered by
the piece indexed with this element. For example, L[4] = {(3, 2), (3, 3)} means the 4-th
piece covers cells at position (3, 2) and (3, 3), where (i, j) means the cell at row i and
column j.
The starting layout can be written as the following Array.

{{(1, 1), (2, 1)}, {(1, 4), (2, 4)}, {(3, 1), (4, 1)}, {(3, 2), (3, 3)}, {(3, 4), (4, 4)},
{(5, 1)}, {(4, 2)}, {(4, 3)}, {(5, 4)}, {(1, 2), (1, 3), (2, 2), (2, 3)}}

When moving the Kloski blocks, we need examine all the 10 blocks, checking each
block if it can move up, down, left and right. it seems that this approach would lead to
a very huge amount of possibilities, because each step might have 10 × 4 options, there
will be about 40n cases in the n-th step.
Actually, there won’t be so much options. For example, in the first step, there are
only 4 valid moving: the 6-th piece moves right; the 7-th and 8-th move down; and the
9-th moves left.
All others are invalid moving. Figure 14.44 shows how to test if the moving is possible.
The left example illustrates sliding block labeled with 1 down. There are two cells
covered by this block. The upper 1 moves to the cell previously occupied by this same
block, which is also labeled with 1; The lower 1 moves to a free cell, which is labeled with
0;
The right example, on the other hand, illustrates invalid sliding. In this case, the
upper cells could move to the cell occupied by the same block. However, the lower cell
labeled with 1 can’t move to the cell occupied by other block, which is labeled with 2.
In order to test the valid moving, we need examine all the cells a block will cover. If
they are labeled with 0 or a number as same as this block, the moving is valid. Otherwise
it conflicts with some other block. For a layout L, the corresponding matrix is M , suppose
we want to move the k-th block with (∆x, ∆y), where |∆x| ≤ 1, |∆y| ≤ 1. The following
equation tells if the moving is valid:

valid(L, k, ∆x, ∆y) :

∀(i, j) ∈ L[k] ⇒ i′ = i + ∆y, j ′ = j + ∆x, (14.74)
(1, 1) ≤ (i′ , j ′ ) ≤ (5, 4), Mi′ j ′ ∈ {k, 0}
438 CHAPTER 14. SEARCHING

Figure 14.44: Left: both the upper and the lower 1 are OK; Right: the upper 1 is OK,
the lower 1 conflicts with 2.

Another important point to solve Kloski puzzle, is about how to avoid repeated at-
tempts. The obvious case is that after a series of sliding, we end up a matrix which
have been transformed from. However, it is not enough to only avoid the same matrix.
Consider the following two metrics. Although M1 6= M2 , we need drop options to M2 ,
because they are essentially the same.
   
1 10 10 2 2 10 10 1
 1 10 10 2   2 10 10 1 
   
 3 4 4 5   
M1 =   M2 =  3 4 4 5 
 3 7 8 5   3 7 6 5 
6 0 0 9 8 0 0 9
This fact tells us, that we should compare the layout, but not merely matrix to avoid
repetition. Denote the corresponding layouts as L1 and L2 respectively, it’s easy to verify
that ||L1 || = ||L2 ||, where ||L|| is the normalized layout, which is defined as below:
||L|| = sort({sort(li )|∀li ∈ L}) (14.75)
In other words, a normalized layout is ordered for all its elements, and every element
is also ordered. The ordering can be defined as that (a, b) ≤ (c, d) ⇔ an + b ≤ cn + d,
where n is the width of the matrix.
Observing that the Kloski board is symmetric, thus a layout can be mirrored from
another one. Mirrored layout is also a kind of repeating, which should be avoided. The
following M1 and M2 show such an example.
   
10 10 1 2 3 1 10 10
 10 10 1 2   3 1 10 10 
   
M1 =  3 5 4 4  M2 =  4 4 2 5 
  
 3 5 8 9   7 6 2 5 
6 7 0 0 0 0 9 8
Note that, the normalized layouts are symmetric to each other. It’s easy to get a
mirrored layout like this:
mirror(L) = {{(i, n − j + 1)|∀(i, j) ∈ l}|∀l ∈ L} (14.76)
14.3. SOLUTION SEARCHING 439

We find that the matrix representation is useful in validating the moving, while the
layout is handy to model the moving and avoid repeated attempt. We can use the similar
approach to solve the Kloski puzzle. We need a queue, every element in the queue contains
two parts: a series of moving and the latest layout led by the moving. Each moving is in
form of (k, (∆y, ∆x)), which means moving the k-th block, with ∆y in row, and ∆x in
column in the board.
The queue contains the starting layout when initialized. Whenever this queue isn’t
empty, we pick the first one from the head, checking if the biggest block is on target,
that L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)}. If yes, then we are done; otherwise, we try to
move every block with 4 options: left, right, up, and down, and store all the possible,
unique new layout to the tail of the queue. During this searching, we need record all the
normalized layouts we’ve ever found to avoid any duplication.
Denote the queue as Q, the historic layouts as H, the first layout on the head of the
queue as L, its corresponding matrix as M . and the moving sequence to this layout as S.
The algorithm can be defined as the following.

 ϕ : Q=ϕ
solve(Q, H) = reverse(S) : L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)} (14.77)

solve(Q′ , H ′ ) : otherwise

The first clause says that if the queue is empty, we’ve tried all the possibilities and
can’t find a solution; The second clause finds a solution, it returns the moving sequence in
reversed order; These are two edge cases. Otherwise, the algorithm expands the current
layout, puts all the valid new layouts to the tail of the queue to yield Q′ , and updates
the normalized layouts to H ′ . Then it performs recursive searching.
In order to expand a layout to valid unique new layouts, we can define a function as
below:

expand(L, H) = {(k, (∆y, ∆x)| ∀k ∈ {1, 2, ..., 10},

∀(∆y, ∆x) ∈ {(0, −1), (0, 1), (−1, 0), (1, 0)}, (14.78)
valid(L, k, ∆x, ∆y), unique(L′ , H)}

Where L′ is the the new layout by moving the k-th block with (∆y, ∆x) from L, M ′
is the corresponding matrix, and M ′′ is the matrix to the mirrored layout of L′ . Function
unique is defined like this:

unique(L′ , H) = M ′ ∈
/ H ∧ M ′′ ∈
/H (14.79)

We’ll next show some example Haskell Kloski programs. As array isn’t mutable in
the purely functional settings, tree based map is used to represent layout 11 . Some type
synonyms are defined as below:
import qualified [Link] as M
import [Link]
import [Link] (sort)

type Point = (Integer, Integer)

type Layout = [Link] Integer [Point]
type Move = (Integer, Point)

data Ops = Op Layout [Move]

The main program is almost as same as the solve(Q, H) function defined above.
11 Alternatively, finger tree based sequence shown in previous chapter can be used
440 CHAPTER 14. SEARCHING

solve :: [Ops] → [[[Point]]]→ [Move]

solve [] _ = [] −− no solution
solve (Op x seq : cs) visit
| [Link] 10 x == Just [(4, 2), (4, 3), (5, 2), (5, 3)] = reverse seq
| otherwise = solve q visit'
where
ops = expand x visit
visit' = map (layout ◦ move x) ops ++ visit
q = cs ++ [Op (move x op) (op:seq) | op ← ops ]

Where function layout gives the normalized form by sorting. move returns the
updated map by sliding the i-th block with (∆y, ∆x).
layout = sort ◦ map sort ◦ [Link]

move x (i, d) = [Link] (Just ◦ map (flip shift d)) i x

shift (y, x) (dy, dx) = (y + dy, x + dx)

Function expand gives all the possible new options. It can be directly translated from
expand(L, H).
expand :: Layout → [[[Point]]] → [Move]
expand x visit = [(i, d) | i ←[1..10],
d ← [(0, -1), (0, 1), (-1, 0), (1, 0)],
valid i d, unique i d] where
valid i d = all (λp → let p' = shift p d in
inRange (bounds board) p' &&
([Link] $ [Link] (elem p') x) `elem` [[i], []])
(maybe [] id $ [Link] i x)
unique i d = let mv = move x (i, d) in
all (`notElem` visit) (map layout [mv, mirror mv])

Note that we also filter out the mirrored layouts. The mirror function is given as
the following.
mirror = [Link] (map (λ (y, x) → (y, 5 - x)))

This program takes several minutes to produce the best solution, which takes 116
steps. The final 3 steps are shown as below:

...

['5', '3', '2', '1']

['5', '3', '2', '1']
['7', '9', '4', '4']
['A', 'A', '6', '0']
['A', 'A', '0', '8']

['5', '3', '2', '1']

['5', '3', '2', '1']
['7', '9', '4', '4']
['A', 'A', '0', '6']
['A', 'A', '0', '8']

['5', '3', '2', '1']

['5', '3', '2', '1']
['7', '9', '4', '4']
['0', 'A', 'A', '6']
14.3. SOLUTION SEARCHING 441

['0', 'A', 'A', '8']

total 116 steps

The Kloski solution can also be realized imperatively. Note that the solve(Q, H) is
tail-recursive, it’s easy to transform the algorithm with looping. We can also link one
layout to its parent, so that the moving sequence can be recorded globally. This can save
some spaces, as the queue needn’t store the moving information in every element. When
output the result, we only need back-tracking to the starting layout from the last one.
Suppose function Link(L′ , L) links a new layout L′ to its parent layout L. The
following algorithm takes a starting layout, and searches for best moving sequence.
1: function Solve(L0 )
2: H ← ||L0 ||
3: Q←ϕ
4: Push(Q, Link(L0 , NIL))
5: while Q 6= ϕ do
6: L ← Pop(Q)
7: if L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)} then
8: return L
9: else
10: for each L′ ∈ Expand(L, H) do
11: Push(Q, Link(L′ , L))
12: Append(H, ||L′ ||)
13: return NIL ▷ No solution
The following example Python program implements this algorithm:
class Node:
def __init__(self, l, p = None):
[Link] = l
[Link] = p

def solve(start):
visit = set([normalize(start)])
queue = deque([Node(start)])
while queue:
cur = [Link]()
layout = [Link]
if layout[-1] == [(4, 2), (4, 3), (5, 2), (5, 3)]:
return cur
else:
for brd in expand(layout, visit):
[Link](Node(brd, cur))
[Link](normalize(brd))
return None # no solution

Where normalize and expand are implemented as below:

def normalize(layout):
return tuple(sorted([tuple(sorted(r)) for r in layout]))

def expand(layout, visit):

def bound(y, x):
return 1 ≤ y and y ≤ 5 and 1 ≤ x and x ≤ 4
def valid(m, i, y, x):
return m[y - 1][x - 1] in [0, i]
def unique(brd):
(m, n) = (normalize(brd), normalize(mirror(brd)))
return m not in visit and n not in visit
s = []
442 CHAPTER 14. SEARCHING

d = [(0, -1), (0, 1), (-1, 0), (1, 0)]

m = matrix(layout)
for i in range(1, 11):
for (dy, dx) in d:
if all(bound(y + dy, x + dx) and valid(m, i, y + dy, x + dx)
for (y, x) in layout[i - 1]):
brd = move(layout, (i, (dy, dx)))
if unique(brd):
[Link](brd)
return s

Like most programming languages, arrays are indexed from 0 but not 1 in Python.
This has to be handled properly. The rest functions including mirror, matrix, and
move are implemented as the following.
def mirror(layout):
return [[(y, 5 - x) for (y, x) in r] for r in layout]

def matrix(layout):
m = [[0]∗4 for _ in range(5)]
for (i, ps) in zip(range(1, 11), layout):
for (y, x) in ps:
m[y - 1][x - 1] = i
return m

def move(layout, delta):

(i, (dy, dx)) = delta
m = dup(layout)
m[i - 1] = [(y + dy, x + dx) for (y, x) in m[i - 1]]
return m

def dup(layout):
return [r[:] for r in layout]

It’s possible to modify this Kloski algorithm, so that it does not only stop at the first
solution, but also search all the solutions. In such case, the computation time is bound to
the size of a space V , where V holds all the layouts can be transformed from the starting
layout. If all these layouts are stored globally, with a parent field point to the predecessor,
the space requirement of this algorithm is also bound to O(V ).

Summary of BFS
The above three puzzles, the wolf-goat-cabbage puzzle, the water jugs puzzle, and the
Kloski puzzle show some common solution structure. Similar to the DFS problems, they
all have the starting state and the end state. The wolf-goat-cabbage puzzle starts with
the wolf, the goat, the cabbage and the farmer all in one side, while the other side is
empty. It ends up in a state that they all moved to the other side. The water jugs puzzle
starts with two empty jugs, and ends with either jug contains a certain volume of water.
The Kloski puzzle starts from a layout and ends to another layout that the biggest block
begging slided to a given position.
All problems specify a set of rules which can transfer from one state to another. Dif-
ferent form the DFS approach, we try all the possible options ‘in parallel’. We won’t
search further until all the other alternatives in the same step have been examined. This
method ensures that the solution with the minimum steps can be found before those with
more steps. Review and compare the two figures we’ve drawn before shows the differ-
ence between these two approaches. For the later one, because we expand the searching
horizontally, it is called as Breadth-first search (BFS for short).
As we can’t perform search really in parallel, BFS realization typically utilizes a queue
to store the search options. The candidate with less steps pops from the head, while the
14.3. SOLUTION SEARCHING 443

(a) Depth First Search (b) Breadth First Search

Figure 14.45: Search orders for DFS and BFS.

new candidate with more steps is pushed to the tail of the queue. Note that the queue
should meet constant time enqueue and dequeue requirement, which we’ve explained in
previous chapter of queue. Strictly speaking, the example functional programs shown
above don’t meet this criteria. They use list to mimic queue, which can only provide
linear time pushing. Readers can replace them with the functional queue we explained
before.
BFS provides a simple method to search for optimal solutions in terms of the number
of steps. However, it can’t search for more general optimal solution. Consider another
directed graph as shown in figure 14.46, the length of each section varies. We can’t use
BFS to find the shortest route from one city to another.

a g

15 4 9

b e h 8

11 10 5 12

7 f d

Figure 14.46: A weighted directed graph.

Note that the shortest route from city a to city c isn’t the one with the fewest steps
a → b → c. The total length of this route is 22; But the route with more steps a → e →
f → c is the best. The length of it is 20. The coming sections introduce other algorithms
to search for optimal solution.

14.3.2 Search the optimal solution

Searching for the optimal solution is quite important in many aspects. People need the
‘best’ solution to save time, space, cost, or energy. However, it’s not easy to find the
best solution with limited resources. There have been many optimal problems can only
444 CHAPTER 14. SEARCHING

be solved by brute-force. Nevertheless, we’ve found that, for some of them, There exists
special simplified ways to search the optimal solution.

Grady algorithm
Huffman coding
Huffman coding is a solution to encode information with the shortest length of code.
Consider the popular ASCII code, which uses 7 bits to encode characters, digits, and
symbols. ASCII code can represent 27 = 128 different symbols. With 0, 1 bits, we need
at least log2 n bits to distinguish n different symbols. For text with only case insensitive
English letters, we can define a code table like below.
char code char code
A 00000 N 01101
B 00001 O 01110
C 00010 P 01111
D 00011 Q 10000
E 00100 R 10001
F 00101 S 10010
G 00110 T 10011
H 00111 U 10100
I 01000 V 10101
J 01001 W 10110
K 01010 X 10111
L 01011 Y 11000
M 01100 Z 11001
With this code table, text ‘INTERNATIONAL’ is encoded to 65 bits.

00010101101100100100100011011000000110010001001110101100000011010

Observe the above code table, which actually maps the letter ‘A’ to ’Z’ from 0 to 25.
There are 5 bits to represent every code. Code zero is forced as ’00000’ but not ’0’ for
example. Such kind of coding method, is called fixed-length coding.
Another coding method is variable-length coding. That we can use just one bit ‘0’
for ‘A’, two bits ‘10’ for C, and 5 bits ‘11001’ for ‘Z’. Although this approach can shorten
the total length of the code for ‘INTERNATIONAL’ from 65 bits dramatically, it causes
problem when decoding. When processing a sequence of bits like ‘1101’, we don’t know
if it means ‘1’ followed by ‘101’, which stands for ‘BF’; or ‘110’ followed by ‘1’, which is
‘GB’, or ‘1101’ which is ‘N’.
The famous Morse code is variable-length coding system. That the most used letter
‘E’ is encoded as a dot, while ‘Z’ is encoded as two dashes and two dots. Morse code uses
a special pause separator to indicate the termination of a code, so that the above problem
won’t happen. There is another solution to avoid ambiguity. Consider the following code
table.
char code char code
A 110 E 1110
I 101 L 1111
N 01 O 000
R 001 T 100
Text ‘INTERNATIONAL’ is encoded to 38 bits only:

10101100111000101110100101000011101111
14.3. SOLUTION SEARCHING 445

If decode the bits against the above code table, we won’t meet any ambiguity symbols.
This is because there is no code for any symbol is the prefix of another one. Such code
is called prefix-code. (You may wonder why it isn’t called as non-prefix code.) By using
prefix-code, we needn’t separators at all. So that the length of the code can be shorten.
This is a very interesting problem. Can we find a prefix-code table, which produce the
shortest code for a given text? The very same problem was given to David A. Huffman
in 1951, who was still a student in MIT[91]. His professor Robert M. Fano told the class
that those who could solve this problem needn’t take the final exam. Huffman almost
gave up and started preparing the final exam when he found the most eﬀicient answer.
The idea is to create the coding table according to the frequency of the symbol ap-
peared in the text. The more used symbol is assigned with the shorter code.
It’s not hard to process some text, and calculate the occurrence for each symbol. So
that we have a symbol set, each one is augmented with a weight. The weight can be
the number which indicates the frequency this symbol occurs. We can use the number of
occurrence, or the probabilities for example.
Huffman discovered that a binary tree can be used to generate prefix-code. All symbols
are stored in the leaf nodes. The codes are generated by traversing the tree from root.
When go left, we add a zero; and when go right we add a one.
Figure 14.47 illustrates a binary tree. Taking symbol ’N’ for example, starting from
the root, we first go left, then right and arrive at ’N’. Thus the code for ’N’ is ’01’; While
for symbol ’A’, we can go right, right, then left. So ’A’ is encode to ’110’. Note that this
approach ensures none code is the prefix of the other.

5 8

2 N, 3 4 4

O, 1 R, 1 T, 2 I, 2 A, 2 2

E, 1 L, 1

Figure 14.47: An encoding tree.

Note that this tree can also be used directly for decoding. When scan a series of bits,
if the bit is zero, we go left; if the bit is one, we go right. When arrive at a leaf, we decode
a symbol from that leaf. And we restart from the root of the tree for the coming bits.
Given a list of symbols with weights, we need build such a binary tree, so that the
symbol with greater weight has shorter path from the root. Huffman developed a bottom-
up solution. When start, all symbols are put into a leaf node. Every time, we pick two
nodes, which has the smallest weight, and merge them into a branch node. The weight of
this branch is the sum of its two children. We repeatedly pick the two smallest weighted
nodes and merge till there is only one tree left. Figure 14.48 illustrates such a building
process.
We can reuse the binary tree definition to formalize Huffman coding. We augment
the weight information, and the symbols are only stored in leaf nodes. The following C
like definition, shows an example.
446 CHAPTER 14. SEARCHING

2 2 4

E, 1 L, 1 O, 1 R, 1 T, 2 I, 2

(a) 1. (b) 2. (c) 3.

4 5

A, 2 2 2 N, 3

E, 1 L, 1 O, 1 R, 1

(d) 4. (e) 5.
8

4 4

T, 2 I, 2 A, 2 2

E, 1 L, 1

(f) 6.
13

5 8

2 N, 3 4 4

O, 1 R, 1 T, 2 I, 2 A, 2 2

E, 1 L, 1

(g) 7.

Figure 14.48: Steps to build a Huffman tree.

14.3. SOLUTION SEARCHING 447

struct Node {
int w;
char c;
struct Node ∗left, ∗right;
};

Some limitation can be added to the definition, as empty tree isn’t allowed. A Huffman
tree is either a leaf, which contains a symbol and its weight; or a branch, which only holds
total weight of all leaves. The following Haskell code, for instance, explicitly specifies these
two cases.
data HTr w a = Leaf w a | Branch w (HTr w a) (HTr w a)

When merge two Huffman trees T1 and T2 to a bigger one, These two trees are set as
its children. We can select either one as the left, and the other as the right. the weight
of the result tree T is the sum of its two children. so that w = w1 + w2 . Define T1 < T2 if
w1 < w2 , One possible Huffman tree building algorithm can be realized as the following.
{
T1 : A = {T1 }
build(A) = (14.80)
build({merge(Ta , Tb )} ∪ A′ ) : otherwise

A is a list of trees. It is initialized as leaves for all symbols and their weights. If there
is only one tree in this list, we are done, the tree is the final Huffman tree. Otherwise,
The two smallest tree Ta and Tb are extracted, and the rest trees are hold in list A′ . Ta
and Tb are merged to one bigger tree, and put back to the tree list for further recursive
building.

(Ta , Tb , A′ ) = extract(A) (14.81)

We can scan the tree list to extract the 2 nodes with the smallest weight. Below
equation shows that when the scan begins, the first 2 elements are compared and initialized
as the two minimum ones. An empty accumulator is passed as the last argument.

extract(A) = extract′ (min(T1 , T2 ), max(T1 , T2 ), {T3 , T4 , ...}, ϕ) (14.82)

For every tree, if its weight is less than the smallest two we’ve ever found, we update
the result to contain this tree. For any given tree list A, denote the first tree in it as T1 ,
and the rest trees except T1 as A′ . The scan process can be defined as the following.

 (Ta , Tb , B) : A = ϕ
extract′ (Ta , Tb , A, B) = extract′ (Ta′ , Tb′ , A′ , {Tb } ∪ A) : T1 < Tb (14.83)

extract′ (Ta , Tb , A′ , {T1 } ∪ A) : otherwise

Where Ta′ = min(T1 , Ta ), Tb′ = max(T1 , Ta ) are the updated two trees with the
smallest weights.
The following Haskell example program implements this Huffman tree building algo-
rithm.
build [x] = x
build xs = build ((merge x y) : xs') where
(x, y, xs') = extract xs

extract (x:y:xs) = min2 (min x y) (max x y) xs [] where

min2 x y [] xs = (x, y, xs)
min2 x y (z:zs) xs | z < y = min2 (min z x) (max z x) zs (y:xs)
| otherwise = min2 x y zs (z:xs)
448 CHAPTER 14. SEARCHING

This building solution can also be realized imperatively. Given an array of Huffman
nodes, we can use the last two cells to hold the nodes with the smallest weights. Then we
scan the rest of the array from right to left. Whenever there is a node with the smaller
weight, this node will be exchanged with the bigger one of the last two. After all nodes
have been examined, we merge the trees in the last two cells, and drop the last cell. This
shrinks the array by one. We repeat this process till there is only one tree left.
1: function Huffman(A)
2: while |A| > 1 do
3: n ← |A|
4: for i ← n − 2 down to 1 do
5: if A[i] < Max(A[n], A[n − 1]) then
6: Exchange A[i] ↔ Max(A[n], A[n − 1])
7: A[n − 1] ← Merge(A[n], A[n − 1])
8: Drop(A[n])
9: return A[1]
The following C++ example program implements this algorithm. Note that this
algorithm needn’t the last two elements being ordered.
typedef vector<Node∗> Nodes;

bool lessp(Node∗ a, Node∗ b) { return a→w < b→w; }

Node∗ max(Node∗ a, Node∗ b) { return lessp(a, b) ? b : a; }

void swap(Nodes& ts, int i, int j, int k) {

swap(ts[i], ts[ts[j] < ts[k] ? k : j]);
}

Node∗ huffman(Nodes ts) {

int n;
while((n = [Link]()) > 1) {
for (int i = n - 3; i ≥ 0; --i)
if (lessp(ts[i], max(ts[n-1], ts[n-2])))
swap(ts, i, n-1, n-2);
ts[n-2] = merge(ts[n-1], ts[n-2]);
ts.pop_back();
}
return [Link]();
}

The algorithm merges all the leaves, and it need scan the list in each iteration. Thus
the performance is quadratic. This algorithm can be improved. Observe that each time,
only the two trees with the smallest weights are merged. This reminds us the heap data
structure. Heap ensures to access the smallest element fast. We can put all the leaves
in a heap. For binary heap, this is typically a linear operation. Then we extract the
minimum element twice, merge them, then put the bigger tree back to the heap. This is
O(lg n) operation if binary heap is used. So the total performance is O(n lg n), which is
better than the above algorithm. The next algorithm extracts the node from the heap,
and starts Huffman tree building.

build(H) = reduce(top(H), pop(H)) (14.84)

This algorithm stops when the heap is empty; Otherwise, it extracts another nodes
from the heap for merging.
{
T : H=ϕ
reduce(T, H) = (14.85)
build(insert(merge(T, top(H)), pop(H))) : otherwise
14.3. SOLUTION SEARCHING 449

Function build and reduce are mutually recursive. The following Haskell example
program implements this algorithm by using heap defined in previous chapter.
huffman' :: (Num a, Ord a) ⇒ [(b, a)] → HTr a b
huffman' = build' ◦ [Link] ◦ map (λ(c, w) → Leaf w c) where
build' h = reduce ([Link] h) ([Link] h)
reduce x Heap.E = x
reduce x h = build' $ [Link] ([Link] h) (merge x ([Link] h))

The heap solution can also be realized imperatively. The leaves are firstly transformed
to a heap, so that the one with the minimum weight is put on the top. As far as there are
more than 1 elements in the heap, we extract the two smallest, merge them to a bigger
one, and put back to the heap. The final tree left in the heap is the result Huffman tree.
1: function Huffman’(A)
2: Build-Heap(A)
3: while |A| > 1 do
4: Ta ← Heap-Pop(A)
5: Tb ← Heap-Pop(A)
6: Heap-Push(A, Merge(Ta , Tb ))
7: return Heap-Pop(A)
The following example C++ code implements this heap solution. The heap used here
is provided in the standard library. Because the max-heap, but not min-heap would be
made by default, a greater predication is explicitly passed as argument.
bool greaterp(Node∗ a, Node∗ b) { return b→w < a→w; }

Node∗ pop(Nodes& h) {
Node∗ m = [Link]();
pop_heap([Link](), [Link](), greaterp);
h.pop_back();
return m;
}

void push(Node∗ t, Nodes& h) {

h.push_back(t);
push_heap([Link](), [Link](), greaterp);
}

Node∗ huffman1(Nodes ts) {

make_heap([Link](), [Link](), greaterp);
while ([Link]() > 1) {
Node∗ t1 = pop(ts);
Node∗ t2 = pop(ts);
push(merge(t1, t2), ts);
}
return [Link]();
}

When the symbol-weight list has been already sorted, there exists a linear time method
to build the Huffman tree. Observe that during the Huffman tree building, it produces a
series of merged trees with weight in ascending order. We can use a queue to manage the
merged trees. Every time, we pick the two trees with the smallest weight from both the
queue and the list, merge them and push the result to the queue. All the trees in the list
will be processed, and there will be only one tree left in the queue. This tree is the result
Huffman tree. This process starts by passing an empty queue as below.
build′ (A) = reduce′ (extract′′ (ϕ, A)) (14.86)
Suppose A is in ascending order by weight, At any time, the tree with the smallest
weight is either the header of the queue, or the first element of the list. Denote the header
450 CHAPTER 14. SEARCHING

of the queue is Ta , after pops it, the queue is Q′ ; The first element in A is Tb , the rest
elements are hold in A′ . Function extract′′ can be defined like the following.

 (Tb , (Q, A′ )) : Q=ϕ
′′
extract (Q, A) = (Ta , (Q′ , A)) : A = ϕ ∨ Ta < Tb (14.87)

(Tb , (Q, A′ )) : otherwise

Actually, the pair of queue and tree list can be viewed as a special heap. The tree
with the minimum weight is continuously extracted and merged.
′
reduce
{ (T, (Q, A)) =
T : Q=ϕ∧A=ϕ (14.88)
reduce′ (extract′′ (push(Q′′ , merge(T, T ′ )), A′′ )) : otherwise

Where (T ′ , (Q′′ , A′′ )) = extract′′ (Q, A), which means extracting another tree. The
following Haskell example program shows the implementation of this method. Note that
this program explicitly sort the leaves, which isn’t necessary if the leaves are ordered.
Again, the list, but not a real queue is used here for illustration purpose. List isn’t good
at pushing new element, please refer to the chapter of queue for details about it.
huffman'' :: (Num a, Ord a) ⇒ [(b, a)] → HTr a b
huffman'' = reduce ◦ wrap ◦ sort ◦ map (λ(c, w) → Leaf w c) where
wrap xs = delMin ([], xs)
reduce (x, ([], [])) = x
reduce (x, h) = let (y, (q, xs)) = delMin h in
reduce $ delMin (q ++ [merge x y], xs)
delMin ([], (x:xs)) = (x, ([], xs))
delMin ((q:qs), []) = (q, (qs, []))
delMin ((q:qs), (x:xs)) | q < x = (q, (qs, (x:xs)))
| otherwise = (x, ((q:qs), xs))

This algorithm can also be realized imperatively.

1: function Huffman”(A) ▷ A is ordered by weight
2: Q←ϕ
3: T ← Extract(Q, A)
4: while Q 6= ϕ ∨ A 6= ϕ do
5: Push(Q, Merge(T , Extract(Q, A)))
6: T ← Extract(Q, A)
7: return T
Where function Extract(Q, A) extracts the tree with the smallest weight from the
queue and the array of trees. It mutates the queue and array if necessary. Denote the
head of the queue is Ta , and the first element of the array as Tb .
1: function Extract(Q, A)
2: if Q 6= ϕ ∧ (A = ϕ ∨ Ta < Tb ) then
3: return Pop(Q)
4: else
5: return Detach(A)
Where procedure Detach(A), removes the first element from A, and returns this
element as result. In most imperative settings, as detaching the first element is slow linear
operation for array, we can store the trees in descending order by weight, and remove the
last element. This is a fast constant time operation. The below C++ example code shows
this idea.
Node∗ extract(queue<Node∗>& q, Nodes& ts) {
Node∗ t;
if (![Link]() && ([Link]() | | lessp([Link](), [Link]()))) {
14.3. SOLUTION SEARCHING 451

t = [Link]();
[Link]();
} else {
t = [Link]();
ts.pop_back();
}
return t;
}

Node∗ huffman2(Nodes ts) {

queue<Node∗> q;
sort([Link](), [Link](), greaterp);
Node∗ t = extract(q, ts);
while (![Link]() | | ![Link]()) {
[Link](merge(t, extract(q, ts)));
t = extract(q, ts);
}
return t;
}

Note that the sorting isn’t necessary if the trees have already been ordered. It can be
a linear time reversing in case the trees are in ascending order by weight.
There are three different Huffman man tree building methods explained. Although
they follow the same approach developed by Huffman, the result trees varies. Figure 14.49
shows the three different Huffman trees built with these methods.

5 8
5 8

A, 2 N, 3 4 4
2 N, 3 4 4

2 T, 2 2 I, 2
O, 1 R, 1 T, 2 I, 2 A, 2 2

L, 1 E, 1 O, 1 R, 1 E, 1 L, 1

(a) Created by scan method. (b) Created by heap method.

5 8

2 N, 3 4 4

O, 1 R, 1 A, 2 I, 2 T, 2 2

E, 1 L, 1

(c) Linear time building for sorted list.

Figure 14.49: Variation of Huffman trees for the same symbol list.

Although these three trees are not identical. They are all able to generate the most
eﬀicient code. The formal proof is skipped here. The detailed information can be referred
to [91] and Section 16.3 of [4].
The Huffman tree building is the core idea of Huffman coding. Many things can be
easily achieved with the Huffman tree. For example, the code table can be generated by
traversing the tree. We start from the root with the empty prefix p. For any branches, we
append a zero to the prefix if turn left, and append a one if turn right. When a leaf node
is arrived, the symbol represented by this node and the prefix are put to the code table.
452 CHAPTER 14. SEARCHING

Denote the symbol of a leaf node as c, the children for tree T as Tl and Tr respectively.
The code table association list can be built with code(T, ϕ), which is defined as below.
{
{(c, p)} : leaf (T )
code(T, p) = (14.89)
code(Tl , p ∪ {0}) ∪ code(Tr , p ∪ {1}) : otherwise

Where function leaf (T ) tests if tree T is a leaf or a branch node. The following Haskell
example program generates a map as the code table according to this algorithm.
code tr = [Link] $ traverse [] tr where
traverse bits (Leaf _ c) = [(c, bits)]
traverse bits (Branch _ l r) = (traverse (bits ++ [0]) l) ++
(traverse (bits ++ [1]) r)

The imperative code table generating algorithm is left as exercise. The encoding
process can scan the text, and look up the code table to output the bit sequence. The
realization is skipped here.
The decoding process is realized by looking up the Huffman tree according to the bit
sequence. We start from the root, whenever a zero is received, we turn left, otherwise
if a one is received, we turn right. If a leaf node is arrived, the symbol represented by
this leaf is output, and we start another looking up from the root. The decoding process
ends when all the bits are consumed. Denote the bit sequence as B = {b1 , b2 , ...}, all bits
except the first one are hold in B ′ , below definition realizes the decoding algorithm.


 {c} : B = ϕ ∧ leaf (T )

{c} ∪ decode(root(T ), B) : leaf (T )
decode(T, B) = (14.90)

 decode(Tl , B ′ ) : b1 = 0

decode(Tr , B ′ ) : otherwise

Where root(T ) returns the root of the Huffman tree. The following Haskell example
code implements this algorithm.
decode tr cs = find tr cs where
find (Leaf _ c) [] = [c]
find (Leaf _ c) bs = c : find tr bs
find (Branch _ l r) (b:bs) = find (if b == 0 then l else r) bs

Note that this is an on-line decoding algorithm with linear time performance. It con-
sumes one bit per time. This can be clearly noted from the below imperative realization,
where the index keeps increasing by one.
1: function Decode(T, B)
2: W ←ϕ
3: n ← |B|, i ← 1
4: while i < n do
5: R←T
6: while ¬ Leaf(R) do
7: if B[i] = 0 then
8: R ← Left(R)
9: else
10: R ← Right(R)
11: i←i+1
12: W ← W ∪ Symbol(R)
13: return W
This imperative algorithm can be implemented as the following example C++ pro-
gram.
14.3. SOLUTION SEARCHING 453

string decode(Node∗ root, const char∗ bits) {

string w;
while (∗bits) {
Node∗ t = root;
while (!isleaf(t))
t = '0' == ∗bits++ ? t→left : t→right;
w += t→c;
}
return w;
}

Huffman coding, especially the Huffman tree building shows an interesting strategy.
Each time, there are multiple options for merging. Among the trees in the list, Huffman
method always selects two trees with the smallest weight. This is the best choice at that
merge stage. However, these series of local best options generate a global optimal prefix
code.
It’s not always the case that the local optimal choice also leads to the global optimal
solution. In most cases, it doesn’t. Huffman coding is a special one. We call the strategy
that always choosing the local best option as greedy strategy.
Greedy method works for many problems. However, it’s not easy to tell if the greedy
method can be applied to get the global optimal solution. The generic formal proof is
still an active research area. Section 16.4 in [4] provides a good treatment for Matroid
tool, which covers many problems that greedy algorithm can be applied.

Change-making problem
We often change money when visiting other countries. People tend to use credit card
more often nowadays than before, because it’s quite convenient to buy things without
considering much about changes. If we changed some money in the bank, there are often
some foreign money left by the end of the trip. Some people like to change them to coins
for collection. Can we find a solution, which can change the given amount of money with
the least number of coins?
Let’s use USA coin system for example. There are 5 different coins: 1 cent, 5 cent,
25 cent, 50 cent, and 1 dollar. A dollar is equal to 100 cents. Using the greedy method
introduced above, we can always pick the largest coin which is not greater than the
remaining amount of money to be changed. Denote list C = {1, 5, 25, 50, 100}, which
stands for the value of coins. For any given money X, the change coins can be generated
as below.

 ϕ : X=0
change(X, C) = otherwise,
 {cm } ∪ change(X − cm , C) :
cm = max({c ∈ C, c ≤ X})
(14.91)
If C is in descending order, cm can be found as the first one not greater than X. If we
want to change 1.42 dollar, This function produces a coin list of {100, 25, 5, 5, 5, 1, 1}. The
output coins list can be easily transformed to contain pairs {(100, 1), (25, 1), (5, 3), (1, 2)}.
That we need one dollar, a quarter, three coins of 5 cent, and 2 coins of 1 cent to make
the change. The following Haskell example program outputs result as such.
solve x = assoc ◦ change x where
change 0 _ = []
change x cs = let c = head $ filter ( ≤ x) cs in c : change (x - c) cs

assoc = (map (λcs → (head cs, length cs))) ◦ group

As mentioned above, this program assumes the coins are in descending order, for
instance like below.
454 CHAPTER 14. SEARCHING

solve 142 [100, 50, 25, 5, 1]

This algorithm is tail recursive, it can be transformed to a imperative looping.

1: function Change(X, C)
2: R←ϕ
3: while X 6= 0 do
4: cm = max({c ∈ C, c ≤ X})
5: R ← {cm } ∪ R
6: X ← X − cm
7: return R
The following example Python program implements this imperative version and man-
ages the result with a dictionary.
def change(x, coins):
cs = {}
while x ̸= 0:
m = max([c for c in coins if c ≤ x])
cs[m] = 1 + [Link](m, 0)
x = x - m
return cs

For a coin system like USA, the greedy approach can find the optimal solution. The
amount of coins is the minimum. Fortunately, our greedy method works in most countries.
But it is not always true. For example, suppose a country have coins of value 1, 3, and
4 units. The best change for value 6, is to use two coins of 3 units, however, the greedy
method gives a result of three coins: one coin of 4, two coins of 1. Which isn’t the optimal
result.

Summary of greedy method

As shown in the change making problem, greedy method doesn’t always give the best
result. In order to find the optimal solution, we need dynamic programming which will
be introduced in the next section.
However, the result is often good enough in practice. Let’s take the word-wrap problem
for example. In modern software editors and browsers, text spans to multiple lines if the
length of the content is too long to be hold. With word-wrap supported, user needn’t
hard line breaking. Although dynamic programming can wrap with the minimum number
of lines, it’s overkill. On the contrary, greedy algorithm can wrap with lines approximate
to the optimal result with quite effective realization as below. Here it wraps text T , not
to exceeds line width W , with space s between each word.
1: L ← W
2: for w ∈ T do
3: if |w| + s > L then
4: Insert line break
5: L ← W − |w|
6: else
7: L ← L − |w| − s
For each word w in the text, it uses a greedy strategy to put as many words in a line
as possible unless it exceeds the line width. Many word processors use a similar algorithm
to do word-wrapping.
There are many cases, the strict optimal result, but not the approximate one is nec-
essary. Dynamic programming can help to solve such problems.
14.3. SOLUTION SEARCHING 455

Dynamic programming
In the change-making problem, we mentioned the greedy method can’t always give the
optimal solution. For any coin system, are there any way to find the best changes?
Suppose we have find the best solution which makes X value of money. The coins
needed are contained in Cm . We can partition these coins into two collections, C1 and
C2 . They make money of X1 , and X2 respectively. We’ll prove that C1 is the optimal
solution for X1 , and C2 is the optimal solution for X2 .
Proof. For X1 , Suppose there exists another solution C1′ , which uses less coins than C1 .
Then changing solution C1′ ∪ C2 uses less coins to make X than Cm . This is conflict with
the fact that Cm is the optimal solution to X. Similarity, we can prove C2 is the optimal
solution to X2 .
Note that it is not true in the reverse situation. If we arbitrary select a value Y <
X, divide the original problem to find the optimal solutions for sub problems Y and
X − Y . Combine the two optimal solutions doesn’t necessarily yield optimal solution for
X. Consider this example. There are coins with value 1, 2, and 4. The optimal solution
for making value 6, is to use 2 coins of value 2, and 4; However, if we divide 6 = 3 + 3,
since each 3 can be made with optimal solution 3 = 1 + 2, the combined solution contains
4 coins (1 + 1 + 2 + 2).
If an optimal problem can be divided into several sub optimal problems, we call it has
optimal substructure. We see that the change-making problem has optimal substructure.
But the dividing has to be done based on the coins, but not with an arbitrary value.
The optimal substructure can be expressed recursively as the following.
{
ϕ : X=0
change(X) = (14.92)
least({c ∪ change(X − c)|c ∈ C, c ≤ X}) : otherwise

For any coin system C, the changing result for zero is empty; otherwise, we check
every candidate coin c, which is not greater then value X, and recursively find the best
solution for X − c; We pick the coin collection which contains the least coins as the result.
Below Haskell example program implements this top-down recursive solution.
change _ 0 = []
change cs x = minimumBy (compare `on` length)
[c:change cs (x - c) | c ← cs, c ≤ x]

Although this program outputs correct answer [2, 4] when evaluates change [1,
2, 4] 6, it performs very bad when changing 1.42 dollar with USA coins system. It
failed to find the answer within 15 minutes in a computer with 2.7GHz CPU and 8G
memory.
The reason why it’s slow is because there are a lot of duplicated computing in the top-
down recursive solution. When it computes change(142), it needs to examine change(141), change(137),
and change(42). While change(141) next computes to smaller values by deducing with
1, 2, 25, 50 and 100 cents. it will eventually meets value 137, 117, 92, and 42 again. The
search space explodes with power of 5.
This is quite similar to compute Fibonacci numbers in a top-down recursive way.
{
1 : n=1∨n=2
Fn = (14.93)
Fn−1 + Fn−2 : otherwise

When we calculate F8 for example, we recursively calculate F7 and F6 . While when

we calculate F7 , we need calculate F6 again, and F5 , ... As shown in the below expand
forms, the calculation is doubled every time, and the same value is calculate again and
again.
456 CHAPTER 14. SEARCHING

F8 = F7 + F6
= F6 + F5 + F5 + F4
= F5 + F4 + F4 + F3 + F4 + F3 + F3 + F2
= ...

In order to avoid duplicated computation, a table F can be maintained when calcu-

lating the Fibonacci numbers. The first two elements are filled as 1, all others are left
blank. During the top-down recursive calculation, If need Fk , we first look up this table
for the k-th cell, if it isn’t blank, we use that value directly. Otherwise we need further
calculation. Whenever a value is calculated, we store it in the corresponding cell for
looking up in the future.
1: F ← {1, 1, N IL, N IL, ...}
2: function Fibonacci(n)
3: if n > 2 ∧ F [n] = N IL then
4: F [n] ← Fibonacci(n − 1) + Fibonacci(n − 2)
5: return F [n]
By using the similar idea, we can develop a new top-down change-making solution.
We use a table T to maintain the best changes, it is initialized to all empty coin list.
During the top-down recursive computation, we look up this table for smaller changing
values. Whenever a intermediate value is calculated, it is stored in the table.
1: T ← {ϕ, ϕ, ...}
2: function Change(X)
3: if X > 0 ∧ T [X] = ϕ then
4: for c ∈ C do
5: if c ≤ X then
6: Cm ← {c}∪ Change(X − c)
7: if T [X] = ϕ ∨ |Cm | < |T [X]| then
8: T [X] ← Cm
9: return T [X]
The solution to change 0 money is definitely empty ϕ, otherwise, we look up T [X] to
retrieve the solution to change X money. If it is empty, we need recursively calculate it.
We examine all coins in the coin system C which is not greater than X. This is the sub
problem of making changes for money X − c. The minimum amount of coins plus one
coin of c is stored in T [X] finally as the result.
The following example Python program implements this algorithm just takes 8000 ms
to give the answer of changing 1.42 dollar in US coin system.
tab = [[] for _ in range(1000)]

def change(x, cs):

if x > 0 and tab[x] == []:
for s in [[c] + change(x - c, cs) for c in cs if c ≤ x]:
if tab[x] == [] or len(s) < len(tab[x]):
tab[x] = s
return tab[x]

Another solution to calculate Fibonacci number, is to compute them in order of

F1 , F2 , F3 , ..., Fn . This is quite natural when people write down Fibonacci series.
1: function Fibo(n)
2: F = {1, 1, N IL, N IL, ...}
3: for i ← 3 to n do
4: F [i] ← F [i − 1] + F [i − 2]
5: return F [n]
14.3. SOLUTION SEARCHING 457

We can use the quite similar idea to solve the change making problem. Starts from
zero money, which can be changed by an empty list of coins, we next try to figure out
how to change money of value 1. In US coin system for example, A cent can be used; The
next values of 2, 3, and 4, can be changed by two coins of 1 cent, three coins of 1 cent,
and 4 coins of 1 cent. At this stage, the solution table looks like below
0 1 2 3 4
ϕ {1} {1, 1} {1, 1, 1} {1, 1, 1, 1}
The interesting case happens for changing value 5. There are two options, use another
coin of 1 cent, which need 5 coins in total; The other way is to use 1 coin of 5 cent, which
uses less coins than the former. So the solution table can be extended to this.
0 1 2 3 4 5
ϕ {1} {1, 1} {1, 1, 1} {1, 1, 1, 1} {5}
For the next change value 6, since there are two types of coin, 1 cent and 5 cent, are
less than this value, we need examine both of them.
• If we choose the 1 cent coin, we need next make changes for 5; Since we’ve already
known that the best solution to change 5 is {5}, which only needs a coin of 5 cents,
by looking up the solution table, we have one candidate solution to change 6 as
{5, 1};
• The other option is to choose the 5 cent coin, we need next make changes for 1; By
looking up the solution table we’ve filled so far, the sub optimal solution to change
1 is {1}. Thus we get another candidate solution to change 6 as {1, 5};
It happens that, both options yield a solution of two coins, we can select either of
them as the best solution. Generally speaking, the candidate with fewest number of coins
is selected as the solution, and filled into the table.
At any iteration, when we are trying to change the i < X value of money, we examine
all the types of coin. For any coin c not greater than i, we look up the solution table to
fetch the sub solution T [i − c]. The number of coins in this sub solution plus the one coin
of c are the total coins needed in this candidate solution. The fewest candidate is then
selected and updated to the solution table.
The following algorithm realizes this bottom-up idea.
1: function Change(X)
2: T ← {ϕ, ϕ, ...}
3: for i ← 1 to X do
4: for c ∈ C, c ≤ i do
5: if T [i] = ϕ ∨ 1 + |T [i − c]| < |T [i]| then
6: T [i] ← {c} ∪ T [i − c]
7: return T [X]
This algorithm can be directly translated to imperative programs, like Python for
example.
def changemk(x, cs):
s = [[] for _ in range(x+1)]
for i in range(1, x+1):
for c in cs:
if c ≤ i and (s[i] == [] or 1 + len(s[i-c]) < len(s[i])):
s[i] = [c] + s[i-c]
return s[x]

Observe the solution table, it’s easy to find that, there are many duplicated contents
being stored.
6 7 8 9 10 ...
{1, 5} {1, 1, 5} {1, 1, 1, 5} {1, 1, 1, 1, 5} {5, 5} ...
458 CHAPTER 14. SEARCHING

This is because the optimal sub solutions are completely copied and saved in parent
solution. In order to use less space, we can only record the ‘delta’ part from the sub
optimal solution. In change-making problem, it means that we only need to record the
coin being selected for value i.
1: function Change’(X)
2: T ← {0, ∞, ∞, ...}
3: S ← {N IL, N IL, ...}
4: for i ← 1 to X do
5: for c ∈ C, c ≤ i do
6: if 1 + T [i − c] < T [i] then
7: T [i] ← 1 + T [i − c]
8: S[i] ← c
9: while X > 0 do
10: Print(S[X])
11: X ← X − S[X]
Instead of recording the complete solution list of coins, this new algorithm uses two
tables T and S. T holds the minimum number of coins needed for changing value 0, 1, 2,
...; while S holds the first coin being selected for the optimal solution. For the complete
coin list to change money X, the first coin is thus S[X], the sub optimal solution is to
change money X ′ = X − S[X]. We can look up table S[X ′ ] for the next coin. The coins
for sub optimal solutions are repeatedly looked up like this till the beginning of the table.
Below Python example program implements this algorithm.
def chgmk(x, cs):
cnt = [0] + [x+1] ∗ x
s = [0]
for i in range(1, x+1):
coin = 0
for c in cs:
if c ≤ i and 1 + cnt[i-c] < cnt[i]:
cnt[i] = 1 + cnt[i-c]
coin = c
[Link](coin)
r = []
while x > 0:
[Link](s[x])
x = x - s[x]
return r

This change-making solution loops n times for given money n. It examines at most
the full coin system in each iteration. The time is bound to Θ(nk) where k is the number
of coins for a certain coin system. The last algorithm adds O(n) spaces to record sub
optimal solutions with table T and S.
In purely functional settings, There is no means to mutate the solution table and look
up in constant time. One alternative is to use finger tree as we mentioned in previous
chapter 12 . We can store the minimum number of coins, and the coin leads to the sub
optimal solution in pairs.
The solution table, which is a finger tree, is initialized as T = {(0, 0)}. It means
change 0 money need no coin. We can fold on list {1, 2, ..., X}, start from this table,
with a binary function change(T, i). The folding will build the solution table, and we can
construct the coin list from this table by function make(X, T ).

makeChange(X) = make(X, f old(change, {(0, 0)}, {1, 2, ..., X})) (14.94)

12 Some purely functional programming environments, Haskell for instance, provide built-in array; while

other almost pure ones, such as ML, provide mutable array

14.3. SOLUTION SEARCHING 459

In function change(T, i), all the coins not greater than i are examined to select the
one lead to the best result. The fewest number of coins, and the coin being selected are
formed to a pair. This pair is inserted to the finger tree, so that a new solution table is
returned.

change(T, i) = insert(T, f old(sel, (∞, 0), {c|c ∈ C, c ≤ i})) (14.95)

Again, folding is used to select the candidate with the minimum number of coins. This
folding starts with initial value (∞, 0), on all valid coins. function sel((n, c), c′ ) accepts
two arguments, one is a pair of length and coin, which is the best solution so far; the
other is a candidate coin, it examines if this candidate can make better solution.
{
′ (1 + n′ , c′ ) : 1 + n′ < n, (n′ , c′ ) = T [i − c′ ]
sel((n, c), c ) = (14.96)
(n, c) : otherwise

After the solution table is built, the coins needed can be generated from it.
{
ϕ : X=0
make(X, T ) = (14.97)
{c} ∪ make(X − c, T ) : otherwise, (n, c) = T [X]

The following example Haskell program uses [Link], which is the library
of finger tree, to implement change making solution.
import [Link] (Seq, singleton, index, ( |>))

changemk x cs = makeChange x $ foldl change (singleton (0, 0)) [1..x] where

change tab i = let sel c = min (1 + fst (index tab (i - c)), c)
in tab |> (foldr sel ((x + 1), 0) $ filter ( ≤ i) cs)
makeChange 0 _ = []
makeChange x tab = let c = snd $ index tab x in c : makeChange (x - c) tab

It’s necessary to memorize the optimal solution to sub problems no matter using the
top-down or the bottom-up approach. This is because a sub problem is used many times
when computing the overall optimal solution. Such properties are called overlapping sub
problems.

Properties of dynamic programming

Dynamic programming was originally named by Richard Bellman in 1940s. It is a powerful
tool to search for optimal solution for problems with two properties.

• Optimal sub structure. The problem can be broken down into smaller problems,
and the optimal solution can be constructed eﬀiciently from solutions of these sub
problems;

• Overlapping sub problems. The problem can be broken down into sub problems
which are reused several times in finding the overall solution.

The change-making problem, as we’ve explained, has both optimal sub structures, and
overlapping sub problems.

Longest common subsequence problem

The longest common subsequence problem, is different with the longest common substring
problem. We’ve show how to solve the later in the chapter of suﬀix tree. The longest
common subsequence needn’t be consecutive part of the original sequence.
460 CHAPTER 14. SEARCHING

Figure 14.50: The longest common subsequence

For example, The longest common substring for text “Mississippi”, and “Missunder-
standing” is “Miss”, while the longest common subsequence for them are “Misssi”. This
is shown in figure 14.50.
If we rotate the figure vertically, and consider the two texts as two pieces of source
code, it turns to be a ‘diff’ result between them. Most modern version control tools need
calculate the difference content among the versions. The longest common subsequence
problem plays a very important role.
If either one of the two strings X and Y is empty, the longest common subse-
quence LCS(X, Y ) is definitely empty; Otherwise, denote X = {x1 , x2 , ..., xn }, Y =
{y1 , y2 , ..., ym }, if the first elements x1 and y1 are same, we can recursively find the
longest subsequence of X ′ = {x2 , x3 , ..., xn } and Y ′ = {y2 , y3 , ..., ym }. And the final re-
sult LCS(X, Y ) can be constructed by concatenating x1 with LCS(X ′ , Y ′ ); Otherwise if
x1 6= y1 , we need recursively find the longest common subsequences of LCS(X, Y ′ ) and
LCS(X ′ , Y ), and pick the longer one as the final result. Summarize these cases gives the
below definition.

 ϕ : X =ϕ∨Y =ϕ
LCS(X, Y ) = {x1 } ∪ LCS(X ′ , Y ′ ) : x1 = y1 (14.98)

longer(LCS(X, Y ′ ), LCS(X ′ , Y )) : otherwise

Note that this algorithm shows clearly the optimal substructure, that the longest
common subsequence problem can be broken to smaller problems. The sub problem is
ensured to be at least one element shorter than the original one.
It’s also clear that, there are overlapping sub-problems. The longest common subse-
quences to the sub strings are used multiple times in finding the overall optimal solution.
The existence of these two properties, the optimal substructure and the overlapping
sub-problem, indicates the dynamic programming can be used to solve this problem.
A 2-dimension table can be used to record the solutions to the sub-problems. The
rows and columns represent the substrings of X and Y respectively.
14.3. SOLUTION SEARCHING 461

a n t e n n a
1 2 3 4 5 6 7
b 1
a 2
n 3
a 4
n 5
a 6
This table shows an example of finding the longest common subsequence for strings
“antenna” and “banana”. Their lengths are 7, and 6. The right bottom corner of this
table is looked up first, Since it’s empty we need compare the 7th element in “antenna”
and the 6th in “banana”, they are both ‘a’, Thus we need next recursively look up the
cell at row 5, column 6; It’s still empty, and we repeated this till either get a trivial case
that one substring becomes empty, or some cell we are looking up has been filled before.
Similar to the change-making problem, whenever the optimal solution for a sub-problem
is found, it is recorded in the cell for further reusing. Note that this process is in the
reversed order comparing to the recursive equation given above, that we start from the
right most element of each string.
Considering that the longest common subsequence for any empty string is still empty,
we can extended the solution table so that the first row and column hold the empty
strings.
a n t e n n a
ϕ ϕ ϕ ϕ ϕ ϕ ϕ
b ϕ
a ϕ
n ϕ
a ϕ
n ϕ
a ϕ
Below algorithm realizes the top-down recursive dynamic programming solution with
such a table.
1: T ← NIL
2: function LCS(X, Y )
3: m ← |X|, n ← |Y |
4: m′ ← m + 1, n′ ← n + 1
5: if T = NIL then
6: T ← {{ϕ, ϕ, ..., ϕ}, {ϕ, N IL, N IL, ...}, ...} ▷ m′ × n′
7: if X 6= ϕ ∧ Y 6= ϕ ∧ T [m′ ][n′ ] = NIL then
8: if X[m] = Y [n] then
9: T [m′ ][n′ ] ← Append(LCS(X[1..m − 1], Y [1..n − 1]), X[m])
10: else
11: T [m′ ][n′ ] ← Longer(LCS(X, Y [1..n − 1]), LCS(X[1..m − 1], Y ))
12: return T [m′ ][n′ ]
The table is firstly initialized with the first row and column filled with empty strings;
the rest are all NIL values. Unless either string is empty, or the cell content isn’t NIL, the
last two elements of the strings are compared, and recursively computes the longest com-
mon subsequence with substrings. The following Python example program implements
this algorithm.
def lcs(xs, ys):
m = len(xs)
n = len(ys)
462 CHAPTER 14. SEARCHING

global tab
if tab is None:
tab = [[""]∗(n+1)] + [[""] + [None]∗n for _ in xrange(m)]
if m ̸= 0 and n ̸= 0 and tab[m][n] is None:
if xs[-1] == ys[-1]:
tab[m][n] = lcs(xs[:-1], ys[:-1]) + xs[-1]
else:
(a, b) = (lcs(xs, ys[:-1]), lcs(xs[:-1], ys))
tab[m][n] = a if len(b) < len(a) else b
return tab[m][n]

The longest common subsequence can also be found in a bottom-up manner as what
we’ve done with the change-making problem. Besides that, instead of recording the whole
sequences in the table, we can just store the lengths of the longest subsequences, and later
construct the subsubsequence with this table and the two strings. This time, the table is
initialized with all values set as 0.
1: function LCS(X, Y )
2: m ← |X|, n ← |Y |
3: T ← {{0, 0, ...}, {0, 0, ...}, ...} ▷ (m + 1) × (n + 1)
4: for i ← 1 to m do
5: for j ← 1 to n do
6: if X[i] = Y [j] then
7: T [i + 1][j + 1] ← T [i][j] + 1
8: else
9: T [i + 1][j + 1] ← Max(T [i][j + 1], T [i + 1][j])
10: return Get(T, X, Y, m, n)

11: function Get(T, X, Y, i, j)

12: if i = 0 ∨ j = 0 then
13: return ϕ
14: else if X[i] = Y [j] then
15: return Append(Get(T, X, Y, i − 1, j − 1), X[i])
16: else if T [i − 1][j] > T [i][j − 1] then
17: return Get(T, X, Y, i − 1, j)
18: else
19: return Get(T, X, Y, i, j − 1)
In the bottom-up approach, we start from the cell at the second row and the second
column. The cell is corresponding to the first element in both X, and Y . If they are
same, the length of the longest common subsequence so far is 1. This can be yielded
by increasing the length of empty sequence, which is stored in the top-left cell, by one;
Otherwise, we pick the maximum value from the upper cell and left cell. The table is
repeatedly filled in this manner.
After that, a back-track is performed to construct the longest common subsequence.
This time we start from the bottom-right corner of the table. If the last elements in X
and Y are same, we put this element as the last one of the result, and go on looking up
the cell along the diagonal line; Otherwise, we compare the values in the left cell and the
above cell, and go on looking up the cell with the bigger value.
The following example Python program implements this algorithm.
def lcs(xs, ys):
m = len(xs)
n = len(ys)
c = [[0]∗(n+1) for _ in xrange(m+1)]
for i in xrange(1, m+1):
for j in xrange(1, n+1):
14.3. SOLUTION SEARCHING 463

if xs[i-1] == ys[j-1]:
c[i][j] = c[i-1][j-1] + 1
else:
c[i][j] = max(c[i-1][j], c[i][j-1])

return get(c, xs, ys, m, n)

def get(c, xs, ys, i, j):

if i==0 or j==0:
return []
elif xs[i-1] == ys[j-1]:
return get(c, xs, ys, i-1, j-1) + [xs[i-1]]
elif c[i-1][j] > c[i][j-1]:
return get(c, xs, ys, i-1, j)
else:
return get(c, xs, ys, i, j-1)

The bottom-up dynamic programming solution can also be defined in purely functional
way. The finger tree can be used as a table. The first row is filled with n + 1 zero values.
This table can be built by folding on sequence X. Then the longest common subsequence
is constructed from the table.

LCS(X, Y ) = construct(f old(f, {{0, 0, ..., 0}}, zip({1, 2, ...}, X))) (14.99)

Note that, since the table need be looked up by index, X is zipped with natural
numbers. Function f creates a new row of this table by folding on sequence Y , and
records the lengths of the longest common sequence for all possible cases so far.

f (T, (i, x)) = insert(T, f old(longest, {0}, zip({1, 2, ...}, Y ))) (14.100)

Function longest takes the intermediate filled row result, and a pair of index and
element in Y , it compares if this element is the same as the one in X. Then fills the new
cell with the length of the longest one.
{
insert(R, 1 + T [i − 1][j − 1]) : x = y
longest(R, (j, y)) = (14.101)
insert(R, max(T [i − 1][j], T [i][j − 1])) : otherwise

After the table is built. The longest common sub sequence can be constructed recur-
←
− ←
−
sively by looking up this table. We can pass the reversed sequences X , and Y together
with their lengths m and n for eﬀicient building.
←− ←−
construct(T ) = get(( X , m), ( Y , n)) (14.102)

If the sequences are not empty, denote the first elements as x and y. The rest elements
←− ←−
are hold in X ′ and Y ′ respectively. The function get can be defined as the following.
 ←
− ←−

 ϕ : X =ϕ∧ Y =ϕ

 ←− ←−
←− ←− get(( X ′ , i − 1), ( Y ′ , j − 1)) ∪ {x} : x = y
get(( X , i), ( Y , j)) = ←−′ ←
−

 get(( X , i − 1), ( Y , j)) : T [i − 1][j] > T [i][j − 1]

 ←− ←−
get(( X , i), ( Y ′ , j − 1)) : otherwise
(14.103)
Below Haskell example program implements this solution.
lcs' xs ys = construct $ foldl f (singleton $ fromList $ replicate (n+1) 0)
(zip [1..] xs) where
(m, n) = (length xs, length ys)
f tab (i, x) = tab |> (foldl longer (singleton 0) (zip [1..] ys)) where
longer r (j, y) = r |> if x == y
464 CHAPTER 14. SEARCHING

then 1 + (tab `index` (i-1) `index` (j-1))

else max (tab ìndex` (i-1) ìndex` j) (r ìndex` (j-1))
construct tab = get (reverse xs, m) (reverse ys, n) where
get ([], 0) ([], 0) = []
get ((x:xs), i) ((y:ys), j)
| x == y = get (xs, i-1) (ys, j-1) ++ [x]
| (tab ìndex` (i-1) ìndex` j) > (tab ìndex` i ìndex` (j-1)) =
get (xs, i-1) ((y:ys), j)
| otherwise = get ((x:xs), i) (ys, j-1)

Subset sum problem

Dynamic programming does not limit to solve the optimization problem, but can also
solve some more general searching problems. Subset sum problem is such an exam-
ple. Given a set of integers, is there a non-empty subset sums to zero? for example,
there are two subsets of {11, 64, −82, −68, 86, 55, −88, −21, 51} both sum to zero. One is
{64, −82, 55, −88, 51}, the other is {64, −82, −68, 86}.
Of course summing to zero is a special case, because sometimes, people want to find
a subset, whose sum is a given value s. Here we are going to develop a method to find all
the candidate subsets.
There is obvious a brute-force exhausting search solution. For every element, we can
either pick it or not. So there are total 2n options for set with n elements. Because for
every selection, we need check if it sums to s. This is a linear operation. The overall
complexity is bound to O(n2n ). This is the exponential algorithm, which takes very huge
time if the set is big.
There is a recursive solution to subset sum problem. If the set is empty, there is no
solution definitely; Otherwise, let the set is X = {x1 , x2 , ...}. If x1 = s, then subset
{x1 } is a solution, we need next search for subsets X ′ = {x2 , x3 , ...} for those sum to s;
Otherwise if x1 = 6 s, there are two different kinds of possibilities. We need search X ′ for
both sum s, and sum s − x1 . For any subset sum to s − x1 , we can add x1 to it to form
a new set as a solution. The following equation defines this algorithm.

 ϕ : X=ϕ
solve(X, s) = {{x1 }} ∪ solve(X ′ , s) : x1 = s

solve(X ′ , s) ∪ {{x1 } ∪ S|S ∈ solve(X ′ , s − x1 )} : otherwise
(14.104)
There are clear substructures in this definition, although they are not in a sense of
optimal. And there are also overlapping sub-problems. This indicates the problem can
be solved with dynamic programming with a table to memorize the solutions to sub-
problems.
Instead of developing a solution to output all the subsets directly, let’s consider how
to give the existence answer firstly. That output ’yes’ if there exists some subset sum to
s, and ’no’ otherwise.
One fact is that, the upper and lower limit for all possible answer can be calculated in
one scan. If the given sum s doesn’t belong to this range, there is no solution obviously.
{ ∑
sl = ∑{x ∈ X, x < 0}
(14.105)
su = {x ∈ X, x > 0}
Otherwise, if sl ≤ s ≤ su , since the values are all integers, we can use a table, with
su − sl + 1 columns, each column represents a possible value in this range, from sl to su .
The value of the cell is either true or false to represents if there exists subset sum to this
value. All cells are initialized as false. Starts from the first element x1 in X, definitely,
set {x1 } can sum to x1 , so that the cell represents this value in the first row can be filled
as true.
14.3. SOLUTION SEARCHING 465

sl sl + 1 ... x1 ... su
x1 F F ... T ... F
With the next element x2 , There are three possible sums. Similar as the first row, {x2 }
sums to x2 ; For all possible sums in previous row, they can also been achieved without
x2 . So the cell below to x1 should also be filled as true; By adding x2 to all possible sums
so far, we can also get some new values. That the cell represents x1 + x2 should be true.
sl sl + 1 ... x1 ... x2 ... x1 + x2 ... su
x1 F F ... T ... F ... F ... F
x2 F F ... T ... T ... T ... F
Generally speaking, when fill the i-th row, all the possible sums constructed with
{x1 , x2 , ..., xi−1 } so far can also be achieved with xi . So the cells previously are true
should also be true in this new row. The cell represents value xi should also be true since
the singleton set {xi } sums to it. And we can also adds xi to all previously constructed
sums to get the new results. Cells represent these new sums should also be filled as true.
When all the elements are processed like this, a table with |X| rows is built. Looking
up the cell represents s in the last row tells if there exists subset can sum to this value.
As mentioned above, there is no solution if s < sl or su < s. We skip handling this case
for the sake of brevity.
1: function Subset-Sum(X, s)
∑
2: sl ← ∑{x ∈ X, x < 0}
3: su ← {x ∈ X, x > 0}
4: n ← |X|
5: T ← {{F alse, F alse, ...}, {F alse, F alse, ...}, ...} ▷ n × (su − sl + 1)
6: for i ← 1 to n do
7: for j ← sl to su do
8: if X[i] = j then
9: T [i][j] ← T rue
10: if i > 1 then
11: T [i][j] ← T [i][j] ∨ T [i − 1][j]
12: j ′ ← j − X[i]
13: if sl ≤ j ′ ≤ su then
14: T [i][j] ← T [i][j] ∨ T [i − 1][j ′ ]
15: return T [n][s]
Note that the index to the columns of the table, doesn’t range from 1 to su − sl + 1,
but maps directly from sl to su . Because most programming environments don’t support
negative index, this can be dealt with T [i][j − sl ]. The following example Python program
utilizes the property of negative indexing.
def solve(xs, s):
low = sum([x for x in xs if x < 0])
up = sum([x for x in xs if x > 0])
tab = [[False]∗(up-low+1) for _ in xs]
for i in xrange(0, len(xs)):
for j in xrange(low, up+1):
tab[i][j] = (xs[i] == j)
j1 = j - xs[i];
tab[i][j] = tab[i][j] or tab[i-1][j] or
(low ≤ j1 and j1 ≤ up and tab[i-1][j1])
return tab[-1][s]

Note that this program doesn’t use different branches for i = 0 and i = 1, 2, ..., n − 1.
This is because when i = 0, the row index to i − 1 = −1 refers to the last row in the
table, which are all false. This simplifies the logic one more step.
With this table built, it’s easy to construct all subsets sum to s. The method is to
466 CHAPTER 14. SEARCHING

look up the last row for cell represents s. If the last element xn = s, then {xn } definitely
is a candidate. We next look up the previous row for s, and recursively construct all the
possible subsets sum to s with {x1 , x2 , x3 , ..., xn−1 }. Finally, we look up the second last
row for cell represents s − xn . And for every subset sums to this value, we add element
xn to construct a new subset, which sums to s.
1: function Get(X, s, T, n)
2: S←ϕ
3: if X[n] = s then
4: S ← S ∪ {X[n]}
5: if n > 1 then
6: if T [n − 1][s] then
7: S ← S∪ Get(X, s, T, n − 1)
8: if T [n − 1][s − X[n]] then
9: S ← S ∪ {{X[n]} ∪ S ′ |S ′ ∈ Get(X, s − X[n], T, n − 1) }
10: return S
The following Python example program translates this algorithm.
def get(xs, s, tab, n):
r = []
if xs[n] == s:
[Link]([xs[n]])
if n > 0:
if tab[n-1][s]:
r = r + get(xs, s, tab, n-1)
if tab[n-1][s - xs[n]]:
r = r + [[xs[n]] + ys for ys in get(xs, s - xs[n], tab, n-1)]
return r

This dynamic programming solution to subset sum problem loops O(n(su − sl + 1))
times to build the table, and recursively uses O(n) time to construct the final solution
from this table. The space it used is also bound to O(n(su − sl + 1)).
Instead of using table with n rows, a vector can be used alternatively. For every
cell represents a possible sum, the list of subsets are stored. This vector is initialized to
contain all empty sets. For every element in X, we update the vector, so that it records
all the possible sums which can be built so far. When all the elements are considered, the
cell corresponding to s contains the final result.
1: function Subset-Sum(X, s)
∑
2: sl ← ∑{x ∈ X, x < 0}
3: su ← {x ∈ X, x > 0}
4: T ← {ϕ, ϕ, ...} ▷ su − sl + 1
5: for x ∈ X do
6: T ′ ← Duplicate(T )
7: for j ← sl to su do
8: j′ ← j − x
9: if x = j then
10: T ′ [j] ← T ′ [j] ∪ {x}
11: if sl ≤ j ′ ≤ su ∧ T [j ′ ] 6= ϕ then
12: T ′ [j] ← T ′ [j] ∪ {{x} ∪ S|S ∈ T [j ′ ]}
13: T ← T′
14: return T [s]
The corresponding Python example program is given as below.
def subsetsum(xs, s):
low = sum([x for x in xs if x < 0])
14.3. SOLUTION SEARCHING 467

up = sum([x for x in xs if x > 0])

tab = [[] for _ in xrange(low, up+1)]
for x in xs:
tab1 = tab[:]
for j in xrange(low, up+1):
if x == j:
tab1[j].append([x])
j1 = j - x
if low ≤ j1 and j1 ≤ up and tab[j1] ̸= []:
tab1[j] = tab1[j] + [[x] + ys for ys in tab[j1]]
tab = tab1
return tab[s]

This imperative algorithm shows a clear structure, that the solution table is built by
looping every element. This can be realized in purely functional way by folding. A finger
tree can be used to represents the vector spans from sl to su . It is initialized with all
empty values as in the following equation.

subsetsum(X, s) = f old(build, {ϕ, ϕ, ..., }, X)[s] (14.106)

After folding, the solution table is built, the answer is looked up at cell s13 .
For every element x ∈ X, function build folds the list {sl , sl + 1, ..., su }, with every
value j, it checks if it equals to x and appends the singleton set {x} to the j-th cell. Not
that here the cell is indexed from sl , but not 0. If the cell corresponding to j − x is not
empty, the candidate solutions stored in that place are also duplicated and add element
x is added to every solution.

build(T, x) = f old(f, T, {sl , sl + 1, ..., su }) (14.107)

{
T ′ [j] ∪ {{x} ∪ Y |Y ∈ T [j ′ ]}
: sl ≤ j ′ ≤ su ∧ T [j ′ ] 6= ϕ, j ′ = j − x
f (T, j) =
T′
: otherwise
(14.108)
Here the adjustment is applied on T ′ , which is another adjustment to T as shown as
below.
{
′ {x} ∪ T [j] : x = j
T = (14.109)
T : otherwise

Note that the first clause in both equation (14.108) and (14.109) return a new table
with certain cell being updated with the given value.
The following Haskell example program implements this algorithm.
subsetsum xs s = foldl build (fromList [[] | _ ← [l..u]]) xs ìdx` s where
l = sum $ filter (< 0) xs
u = sum $ filter (> 0) xs
idx t i = index t (i - l)
build tab x = foldl (λt j → let j' = j - x in
adjustIf (l ≤ j' && j' ≤ u && tab ìdx` j' /= [])
(++ [(x:ys) | ys ← tab ìdx` j']) j
(adjustIf (x == j) ([x]:) j t)) tab [l..u]
adjustIf pred f i seq = if pred then adjust f (i - l) seq else seq

Some materials like [16] provide common structures to abstract dynamic program-
ming. So that problems can be solved with a generic solution by customizing the precon-
dition, the comparison of candidate solutions for better choice, and the merge method for
sub solutions. However, the variety of problems makes things complex in practice. It’s
important to study the properties of the problem carefully.
13 Again, here we skip the error handling to the case that s < s or s > s . There is no solution if s is
l u
out of range.
468 CHAPTER 14. SEARCHING

Exercise 14.3

• Realize a maze solver by using the stack approach, which can find all the possible
paths.

• There are 92 distinct solutions for the 8 queens puzzle. For any one solution, rotating
it 90◦ , 180◦ , 270◦ gives solutions too. Also flipping it vertically and horizontally also
generate solutions. Some solutions are symmetric, so that rotation or flip gives the
same one. There are 12 unique solutions in this sense. Modify the program to find
the 12 unique solutions. Improve the program, so that the 92 distinct solutions can
be found with fewer search.

• Make the 8 queens puzzle solution generic so that it can solve n queens puzzle.

• Make the functional solution to the leap frogs puzzle generic, so that it can solve n
frogs case.

• Modify the wolf, goat, and cabbage puzzle algorithm, so that it can find all possible
solutions.

• Give the complete algorithm definition to solve the 2 water jugs puzzle with extended
Euclid algorithm.

• We needn’t the exact linear combination information x and y in fact. After we know
the puzzle is solvable by testing with GCD, we can blindly execute the process that:
fill A, pour A into B, whenever B is full, empty it till there is expected volume in
one jug. Realize this solution. Can this one find faster solution than the original
version?

• Compare to the extended Euclid method, the BFS approach is a kind of brute-
force searching. Improve the extended Euclid approach by finding the best linear
combination which minimize |x| + |y|.

• John Horton Conway introduced the sliding tile puzzle. Figure 14.51 shows a sim-
plified verson. There are 8 cells, 7 of them are occupied by pieces labeled from 1 to
7. Each piece can slide to the free cell if they are connected. The line between cells
means there is a connectoin. The goal is to reverse the pieces from 1, 2, 3, 4, 5, 6,
7 to 7, 6, 5, 4, 3, 2, 1 by sliding. Develop a program to solve this puzzle.

1 7

2 6

3 5

Figure 14.51: Conway sliding puzzle

• Realize the imperative Huffman code table generating algorithm.

14.4. SHORT SUMMARY 469

• One option to realize the bottom-up solution for the longest common subsequence
problem is to record the direction in the table. Thus, instead of storing the length
information, three values like ’N’, for north, ’W’ for west, and ’NW’ for northwest
are used to indicate how to construct the final result. We start from the bottom-
right corner of the table, if the cell value is ’NW’, we go along the diagonal by
moving to the cell in the upper-left; if it’s ’N’, we move vertically to the upper
row; and move horizontally if it’s ’W’. Implement this approach in your favorite
programming language.

• Given a list of non-negative integers, find the maximum sum composed by numbers
that none of them are adjacent.
• Levenshtein edit distance is defined as the cost of converting from one string s to
another string t. It is widely used in spell-checking, OCR correction etc. There
are three operations allowed in Levenshtein edit distance. Insert a character; delete
a character; and substitute a character. Each operation mutate one character a
time. The following exaple shows how to convert string “kitten” to “sitting”. The
Levenshtein edit distance is 3 in this case.
1. kitten → sitten (substitution of ’s’ for ’k’);
2. sitten → sittin (substitution of ’i’ for ’e’);
3. sitten → sitting (insertion of ’g’ at the end).
Develop a program to calculate Levenshtein edit distance for two strings with Dy-
namic Programming.

14.4 Short summary

This chapter introduces the elementary methods about searching. Some of them instruct
the computer to scan for interesting information among the data. They often have some
structure, that can be updated during the scan. This can be considered as a special case
for the information reusing approach. The other commonly used strategy is divide and
conquer, that the scale of the search domain is kept decreasing till some obvious result.
This chapter also explains methods to search for solutions among domains. The solutions
typically are not the elements being searched. They can be a series of decisions or some
operation arrangement. If there are multiple solutions, sometimes, people want to find
the optimized one. For some spacial cases, there exist simplified approach such as the
greedy methods. And dynamic programming can be used for more wide range of problems
when they shows optimal substructures.
470 CHAPTER 14. SEARCHING
Bibliography

[1] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. ISBN:0262032937. The MIT Press. 2001
[3] M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan, ”Time bounds for selec-
tion,” J. Comput. System Sci. 7 (1973) 448-461.
[4] Jon Bentley. “Programming pearls, Second Edition”. Addison-Wesley Professional;
1999. ISBN-13: 978-0201657883
[5] Richard Bird. “Pearls of functional algorithm design”. Chapter 3. Cambridge Uni-
versity Press. 2010. ISBN, 1139490605, 9781139490603
[6] Edsger W. Dijkstra. “The saddleback search”. EWD-934. 1985.
[Link]
[7] Robert Boyer, and Strother Moore. “MJRTY - A Fast Majority Vote Algorithm”.
Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning
Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991, pp. 105-117.
[8] Cormode, Graham; S. Muthukrishnan (2004). “An Improved Data Stream Summary:
The Count-Min Sketch and its Applications”. J. Algorithms 55: 29¨C38.
[9] Knuth Donald, Morris James H., jr, Pratt Vaughan. “Fast pattern matching in
strings”. SIAM Journal on Computing 6 (2): 323¨C350. 1977.
[10] Robert Boyer, Strother Moore. “A Fast String Searching Algorithm”. Comm. ACM
(New York, NY, USA: Association for Computing Machinery) 20 (10): 762¨C772.
1977
[11] R. N. Horspool. “Practical fast searching in strings”. Software - Practice & Experience
10 (6): 501¨C506. 1980.
[12] Wikipedia. “Boyer-Moore string search algorithm”.
[Link]
[13] Wikipedia. “Eight queens puzzle”. [Link]
[14] George Pólya. “How to solve it: A new aspect of mathematical method”. Princeton
University Press(April 25, 2004). ISBN-13: 978-0691119663
[15] Wikipedia. “David A. Huffman”. [Link]
[16] Fethi Rabhi, Guy Lapalme “Algorithms: a functional programming approach”. Sec-
ond edition. Addison-Wesley.

471
472 Red-black tree
Appendix A

Imperative delete for red-black

tree

We need handle more cases for imperative delete than insert. To resume balance after
cutting off a node fro the red-black tree, we perform rotations and re-coloring. When
delete a black node, rule 5 will be violated because the number of black nodes along the
path through that node reduces by one. We introduce ‘doubly-black’ to maintain the
number of black nodes unchanged. Below example program adds ‘doubly black’ to the
color definition:
data Color {RED, BLACK, DOUBLY_BLACK}

When delete a node, we re-use the binary search tree delete in the first step, then
further fix the balance if the node is black.
1: function Delete(T, x)
2: p ← Parent(x)
3: q ← NIL
4: if Left(x) = NIL then
5: q ← Right(x)
6: Replace(x, Right(x)) ▷ replace x with its right sub-tree
7: else if Right(x) = NIL then
8: q ← Left(x)
9: Replace(x, Leftx()) ▷ replace x with its left sub-tree
10: else
11: y ← Min(Right(x))
12: p ← Parent(y)
13: q ← Right(y)
14: Key(x) ← Key(y)
15: copy data from y to x
16: Replace(y, Right(y)) ▷ replace y with its right sub-tree
17: x←y
18: if Color(x) = BLACK then
19: T ← Delete-Fix(T , Make-Black(p, q), q = NIL?)
20: release x
21: return T
Delete takes the root T and the node x to be deleted as the parameters. x can be
located through lookup. If x has an empty sub-tree, we cut off x, then replace it with
the other sub-tree q. Otherwise, we locate the minimum node y in the right sub-tree of

473
474 APPENDIX A. IMPERATIVE DELETE FOR RED-BLACK TREE

x, then replace x with y. We cut off y after that. If x is black, we call Make-Black(p,
q) to maintain the blackness before further fixing.
1: function Make-Black(p, q)
2: if p = NIL and q = NIL then
3: return NIL ▷ The tree was singleton
4: else if q = NIL then
5: n ← Doubly Black NIL
6: Parent(n) ← p
7: return n
8: else
9: return Blacken(q)
If both p and q are empty, we are deleting the only leaf from a singleton tree. The
result is empty. If the parent p is not empty, but q is, we are deleting a black leaf. We use
NIL to replace that black leaf. As NIL is already black, we change it to ’doubly black’ NIL
to maintain the blackness. Otherwise, if neither p nor q is empty, we call Blacken(q).
If q is red, it changes to black; if q is already black, it changes to doubly black. As the
next step, we need eliminate the doubly blackness through tree rotations and re-coloring.
There are three different cases ([4], pp292). The doubly black node can be NIL or not in
all the cases.
Case 1. The sibling of the doubly black node is black, and it has a red sub-tree. We
can rotate the tree to fix the doubly black. There are 4 sub-cases, all can be transformed
to a uniformed structure as shown in figure A.1.

Figure A.1: The doubly black node has a black sibling, and a red nephew. It can be fixed
with a rotation.

1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then ▷ x is doubly black NIL
4: n←x
5: if x = NIL then ▷ Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do ▷ x is doubly black, but not the root
8: if Sibling(x) 6= NIL then ▷ The sibling is not empty
9: s ← Sibling(x)
475

10: ...
11: if s is black and Left(s) is red then
12: if x = Left(Parent(x)) then ▷ x is the left
13: set x, Parent(x), and Left(s) all black
14: T ← Rotate-Right(T , s)
15: T ← Rotate-Left(T , Parent(x))
16: else ▷ x is the right
17: set x, Parent(x), s, and Left(s) all black
18: T ← Rotate-Right(T , Parent(x))
19: else if s is black and Right(s) is red then
20: if x = Left(Parent(x)) then ▷ x is the left
21: set x, Parent(x), s, and Right(s) all black
22: T ← Rotate-Left(T , Parent(x))
23: else ▷ x is the right
24: set x, Parent(x), and Right(s) all black
25: T ← Rotate-Left(T , s)
26: T ← Rotate-Right(T , Parent(x))
27: ...
Case 2. The sibling of the doubly black is red. We can rotate the tree to change the
doubly black node to black. As shown in figure A.2, change a or c to black. We can add
this fixing to the previous implementation.

Figure A.2: The sibling of the doubly black is red

1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then ▷ x is doubly black NIL
4: n←x
5: if x = NIL then ▷ Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do
8: if Sibling(x) 6= NIL then
9: s ← Sibling(x)
10: if s is red then ▷ The sibling is red
11: set Parent(x) red
12: set s black
13: if x = Left(Parent(x)) then ▷ x is the left
476 APPENDIX A. IMPERATIVE DELETE FOR RED-BLACK TREE

14: T ← Rotate-LeftT , Parent(x)

15: else ▷ x is the right
16: T ← Rotate-RightT , Parent(x)
17: else if s is black and Left(s) is red then
18: ...
Case 3. The sibling of the doubly black node, and its two sub-trees are all black.
In this case, we re-color the sibling to red, change the doubly black node back to black,
then move the doubly blackness up to the parent. As shown in figure A.3, there are two
symetric sub-cases.

Figure A.3: move the blackness up

The sibling of the doubly black isn’t empty in all above 3 cases. Otherwise, we change
the doubly black node back to black, and move the blackness up. When reach the root,
we force the root to be black to complete fixing. It also terminates if the doubly black
node is eliminated after re-color in the midway. At last, if the doubly black node passed
in is empty, we turn it back to normal NIL.
1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then ▷ x is a doubly black NIL
4: n←x
5: if x = NIL then ▷ Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do
8: if Sibling(x) 6= NIL then ▷ The sibling is not empty
9: s ← Sibling(x)
10: if s is red then ▷ The sibling is red
11: set Parent(x) red
12: set s black
13: if x = Left(Parent(x)) then ▷ x is the left
14: T ← Rotate-LeftT , Parent(x)
15: else ▷ x is the right
16: T ← Rotate-RightT , Parent(x)
17: else if s is black and Left(s) is red then
18: if x = Left(Parent(x)) then ▷ x is the left
19: set x, Parent(x), and Left(s) all black
20: T ← Rotate-Right(T , s)
477

21: T ← Rotate-Left(T , Parent(x))

22: else ▷ x is the right
23: set x, Parent(x), s, and Left(s) all black
24: T ← Rotate-Right(T , Parent(x))
25: else if s is black and Right(s) is red then
26: if x = Left(Parent(x)) then ▷ x is the left
27: set x, Parent(x), s, and Right(s) all black
28: T ← Rotate-Left(T , Parent(x))
29: else ▷ x is the right
30: set x, Parent(x), and Right(s) all black
31: T ← Rotate-Left(T , s)
32: T ← Rotate-Right(T , Parent(x))
33: else if s, Left(s), and Right(s) are all black then
34: set x black
35: set s red
36: Blacken(Parent(x))
37: x ← Parent(x)
38: else ▷ move the blackness up
39: set x black
40: Blacken(Parent(x))
41: x ← Parent(x)
42: set T black
43: if n 6= NIL then
44: replace n with NIL
45: return T
When fixing, we pass in the root T , the node x (can be doubly black), and a flag f .
The flag is true if x is doubly black NIL. We record it with n, and replace n with the
normal NIL after fixing.
Below is the example program implements delete:
Node del(Node t, Node x) {
if x == null then return t
var parent = [Link];
Node db = null; //doubly black

if [Link] == null {
db = [Link]
[Link](db)
} else if [Link] == null {
db = [Link]
[Link](db)
} else {
var y = min([Link])
parent = [Link]
db = [Link]
[Link] = [Link]
[Link](db)
x = y
}
if [Link] == [Link] {
t = deleteFix(t, makeBlack(parent, db), db == null);
}
remove(x)
return t
}
478 APPENDIX A. IMPERATIVE DELETE FOR RED-BLACK TREE

Where makeBlack checks if the node changes to doubly black, and handles the special
case of doubly black NIL.
Node makeBlack(Node parent, Node x) {
if parent == null and x == null then return null
return if x == null
then replace(parent, x, Node(0, Color.DOUBLY_BLACK))
else blacken(x)
}

The function replace(parent, x, y) replaces the child of the parent, which is

x, with y.
Node replace(Node parent, Node x, Node y) {
if parent == null {
if y ̸= null then [Link] = null
} else if [Link] == x {
[Link](y)
} else {
[Link](y)
}
if x ̸= null then [Link] = null
return y
}

The function blacken(node) changes the red node to black, and the black node to
doubly black:
Node blacken(Node x) {
[Link] = if isRed(x) then [Link] else Color.DOUBLY_BLACK
return x
}

Below example program implements the fixing:

Node deleteFix(Node t, Node db, Bool isDBEmpty) {
var dbEmpty = if isDBEmpty then db else null
if db == null then return null // delete the root
while (db ̸= t and [Link] == Color.DOUBLY_BLACK) {
var s = [Link]()
var p = [Link]
if (s ̸= null) {
if isRed(s) {
// the sibling is red
[Link] = [Link]
[Link] = [Link]
t = if db == [Link] then leftRotate(t, p)
else rightRotate(t, p)
} else if isBlack(s) and isRed([Link]) {
// the sibling is black, and one sub-tree is red
if db == [Link] {
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
t = rightRotate(t, s)
t = leftRotate(t, p)
} else {
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
t = rightRotate(t, p)
}
} else if isBlack(s) and isRed([Link]) {
if (db == [Link]) {
Elementary Algorithms 479

[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
t = leftRotate(t, p)
} else {
[Link] = [Link]
[Link] = [Link]
[Link] = [Link]
t = leftRotate(t, s)
t = rightRotate(t, p)
}
} else if isBlack(s) and isBlack([Link]) and
isBlack([Link]) {
// the sibling and both sub-trees are black.
// move blackness up
[Link] = [Link]
[Link] = [Link]
blacken(p)
db = p
}
} else { // no sibling, move blackness up
[Link] = [Link]
blacken(p)
db = p
}
}
[Link] = [Link]
if (dbEmpty ̸= null) { // change the doubly black nil to nil
[Link](null)
delete dbEmpty
}
return t
}

Where isBlack(x) tests if a node is black, the NIL node is also black.
Bool isBlack(Node x) = (x == null or [Link] == [Link])

Bool isRed(Node x) = (x ̸= null and [Link] == [Link])

Before returning the final result, we check the doubly black NIL, and call the replaceWith
function defined in Node.
data Node<T> {
//...
void replaceWith(Node y) = replace(parent, this, y)
}

The program terminates when reach the root or the doubly blackness is eliminated.
As we maintain the red-black tree balanced, the delete algorithm is bound to O(lg n) time
for the tree of n nodes.

Exercise A.1
1. Write a program to test if a tree satisfies the 5 red-black tree rules. Use this
program to verify the red-black tree delete implementation.
480 AVL tree - proofs and the delete algorithm
Appendix B

AVL tree - proofs and the

delete algorithm

B.1 Height increment

When insert an element, the increment of the height can be deduced into 4 cases:

∆H = |T ′ | − |T |
= 1 + max(|r′ |, |l′ |) − (1 + max(|r|, |l|))
= max(|r′ |, |l′ |) − max(|r|, |l|)


 δ ≥ 0, δ ′ ≥ 0 : ∆r (B.1)

δ ≤ 0, δ ′ ≥ 0 : δ + ∆r
=

 δ ≥ 0, δ ′ ≤ 0 : ∆l − δ


otherwise : ∆l

Proof. When insert, the height can not increase both on left and right. We can explain
the 4 cases from the balance factor definition, which is the difference of the right and left
sub-trees:

1. If δ ≥ 0 and δ ′ ≥ 0, it means the height of the right sub-tree is not less than the
left sub-tree before and after insertion. In this case, the height increment is only
‘contributed’ from the right, which is ∆r.
2. If δ ≤ 0, it means the height of left sub-tree is not less than the right before. Since
δ ′ ≥ 0 after insert, we know the height of right sub-tree increases, and the left side
keeps same (|l′ | = |l|). The height increment is:

∆H = max(|r′ |, |l′ |) − max(|r|, |l|) {δ ≤ 0 and δ ′ ≥ 0}

= |r′ | − |l| {|l| = |l′ |}
= |r| + ∆r − |l|
= δ + ∆r

3. If δ ≥ 0 and δ ′ ≤ 0, similar to the above case, we have the following:

∆H = max(|r′ |, |l′ |) − max(|r|, |l|) {δ ≥ 0 and δ ′ ≤ 0}

= |l′ | − |r|
= |l| + ∆l − |r|
= ∆l − δ

481
482 APPENDIX B. AVL TREE - PROOFS AND THE DELETE ALGORITHM

4. Otherwise, δ and δ ′ are not bigger than zero. It means the height of the left sub-tree
is not less than the right. The height increment is only ‘contributed’ from the left,
which is ∆l.

B.2 Balance adjustment after insert

The balance factors are ±2 in the 4 cases shown in figure B.1. After fixing, δ(y) resumes
to 0. The height of left and right sub-trees are equal.

Figure B.1: Fix 4 cases to the same structure

The four cases are left-left, right-right, right-left, and left-right. Let the balance
factors before fixing be δ(x), δ(y), and δ(z), after fixing, they change to δ ′ (x), δ ′ (y), and
δ ′ (z) respectively. We next prove that, δ(y) = 0 for all 4 cases after fixing, and give the
result of δ ′ (x) and δ ′ (z).

Proof. We break into 4 cases:

Left-left
The sub-tree x keeps unchanged, hence δ ′ (x) = δ(x). As δ(y) = −1 and δ(z) = −2,
we have:

δ(y) = |c| − |x| = −1 ⇒ |c| = |x| − 1

(B.2)
δ(z) = |d| − |y| = −2 ⇒ |d| = |y| − 2

After fixing:

δ ′ (z) = |d| − |c| {f rom(B.2)}

= |y| − 2 − (|x| − 1)
(B.3)
= |y| − |x| − 1 {x is sub-tree of y ⇒ |y| − |x| = 1}
= 0
B.2. BALANCE ADJUSTMENT AFTER INSERT 483

For δ ′ (y), we have the following:

δ ′ (y) = |z| − |x|

= 1 + max(|c|, |d|) − |x| {by (B.3), |c| = |d|}
= 1 + |c| − |x| {by (B.2)} (B.4)
= 1 + |x| − 1 − |x|
= 0

Summarize the above, the balance factors change to the following in left-left case:

δ ′ (x) = δ(x)
δ ′ (y) = 0 (B.5)
δ ′ (z) = 0

Right-right
The right-right case is symmetric to left-left:

δ ′ (x) = 0
δ ′ (y) = 0 (B.6)
δ ′ (z) = δ(z)

Right-left
Consider δ ′ (x), after fixing, it is:

δ ′ (x) = |b| − |a| (B.7)

Before fixing, the height of z can be obtained as:

|z| = 1 + max(|y|, |d|) {δ(z) = −1 ⇒ |y| > |d|}

= 1 + |y| (B.8)
= 2 + max(|b|, |c|)

Since δ(x) = 2, we have:

δ(x) = 2 ⇒ |z| − |a| = 2 {by (B.8)}

⇒ 2 + max(|b|, |c|) − |a| = 2 (B.9)
⇒ max(|b|, |c|) − |a| = 0

If δ(y) = |c| − |b| = 1, then:

max(|b|, |c|) = |c| = |b| + 1 (B.10)

Take this into (B.9) gives:

|b| + 1 − |a| = 0 ⇒ |b| − |a| = −1 {by (B.7) }

(B.11)
⇒ δ ′ (x) = −1

If δ(y) 6= 1, then max(|b|, |c|) = |b|. Take this into (B.9) gives:

|b| − |a| = 0 {by (B.7)}

(B.12)
⇒ δ ′ (x) = 0

Summarize the 2 cases, we obtain the result of δ ′ (x) in δ(y) as the following:
{
′ δ(y) = 1 : −1
δ (x) = (B.13)
otherwise : 0
484 APPENDIX B. AVL TREE - PROOFS AND THE DELETE ALGORITHM

For δ ′ (z), from the definition, it equals to:

δ ′ (z) = |d| − |c| {δ(z) = −1 = |d| − |y|}
= |y| − |c| − 1 {|y| = 1 + max(|b|, |c|)} (B.14)
= max(|b|, |c|) − |c|
If δ(y) = |c| − |b| = −1, then max(|b|, |c|) = |b| = |c| + 1. Take this into (B.14), we
have δ ′ (z) = 1. If δ(y) 6= −1, then max(|b|, |c|) = |c|. We have δ ′ (z) = 0. Combined these
two cases, we obtain the result of δ ′ (z) in δ(y) as below:
{
′ δ(y) = −1 : 1
δ (z) = (B.15)
otherwise : 0

Finally, for δ ′ (y), we deduce it like below:

δ ′ (y) = |z| − |x|
(B.16)
= max(|c|, |d|) − max(|a|, |b|)
There are three cases:
1. If δ(y) = 0, then |b| = |c|. According to (B.13) and (B.15), we have δ ′ (x) = 0 ⇒
|a| = |b|, and δ ′ (z) = 0 ⇒ |c| = |d|. These lead to δ ′ (y) = 0.
2. If δ(y) = 1, from (B.15), we have δ ′ (z) = 0 ⇒ |c| = |d|.
δ ′ (y) = max(|c|, |d|) − max(|a|, |b|) {|c| = |d|}
= |c| − max(|a|, |b|) {from (B.13): δ ′ (x) = −1 ⇒ |b| − |a| = −1}
= |c| − (|b| + 1) {δ(y) = 1 ⇒ |c| − |b| = 1}
= 0

3. If δ(y) = −1, from (B.13), we have δ ′ (x) = 0 ⇒ |a| = |b|.

δ ′ (y) = max(|c|, |d|) − max(|a|, |b|) {|a| = |b|}
= max(|c|, |d|) − |b| {from (B.15): |d| − |c| = 1}
= |c| + 1 − |b| {δ(y) = −1 ⇒ |c| − |b| = −1}
= 0

All three cases lead to the same result δ ′ (y) = 0. Summarize all above, we get the
updated balance factors after fixing as below:
{
δ(y) = 1 : −1
δ ′ (x) =
otherwise : 0
δ ′ (y) = {
0 (B.17)
δ(y) = −1 : 1
δ ′ (z) =
otherwise : 0

Left-right
Left-right is symmetric to the right-left case. With similar method, we can obtain the
new balance factors that is identical to (B.17).

B.3 Delete algorithm

Deletion may reduce the height of the sub-tree. If the balance factor exceeds the range
of [−1, 1], then we need fixing.
B.3. DELETE ALGORITHM 485

B.3.1 Functional delete

When delete, we re-use the binary search tree delete in the first step, then check the
balance factors and perform fixing. The result is a pair (T ′ , ∆H), where T ′ is the new
tree and ∆H is the height decrement. We define delete as below:

delete = fst ◦ del (B.18)

where del(T, k) does the actual work to delete element k from T :

del ∅ k = (∅,
 0)

 k < k ′ : tree (del l k) k ′ (r, 0) δ



 k > k ′ : tree (l, 0) k ′ (del r k) δ

 
 l = ∅ : (r, −1)

 (B.19)
del (l, k ′ , r, δ) = 
r = ∅ : (l, −1)



 k = k′ :

  tree (l, 0) k ′′ (del r k ′′ ) δ

 
 else :
 
where k ′′ = min(r)

If the tree is empty, the result is (∅, 0); otherwise, let the tree be T = (l, k ′ , r, δ). We
compare the k and k ′ , lookup and delete recursively. When k = k ′ , we locate the node to
be deleted. If it has either empty sub-tree, we cut the node off, and replace it with the
other sub-tree; otherwise, we use the minimum k ′′ in the right sub-tree to replace k ′ , and
cut k ′′ off. We re-use the tree function and ∆H result. Additional to the insert cases,
there are two cases violate AVL rule, and need fixing. As shown in figure B.2, both cases
can be fixed by a tree rotation. We define them as pattern matching:

δ(y) = −2 δ(x)′ = δ(x) + 1

y x
δ(x) = 0 δ(y)′ = −1
x c =⇒ a y

a b b c
(a) Fix case A
δ(x) = 2 δ(y)′ = δ(y) − 1
x y
′
δ(y) = 0 δ(x) = 1
a y =⇒ x c

b c a b
(b) Fix case B

Figure B.2: delete fix

...
balance ((a, x, b, δ(x)), y, c, −2) ∆H = (a, x, (b, y, c, −1), δ(x) + 1, ∆H)
(B.20)
balance (a, x, (b, y, c, δ(y)), 2) ∆H = ((a, x, b, 1), y, c, δ(y) − 1, ∆H)
...

Below is the example program:

486 APPENDIX B. AVL TREE - PROOFS AND THE DELETE ALGORITHM

delete t x = fst $ del t x where

Where min and isEmpty are defined as below:

min (Br Empty x _ _) = x
min (Br l _ _ _) = min l

isEmpty Empty = True

isEmpty _ = False

With the additional two, there are total 7 cases in balance implementation:
balance (Br (Br (Br a x b dx) y c (-1)) z d (-2), dH) =
(Br (Br a x b dx) y (Br c z d 0) 0, dH-1)
balance (Br a x (Br b y (Br c z d dz) 1) 2, dH) =
(Br (Br a x b 0) y (Br c z d dz) 0, dH-1)
balance (Br (Br a x (Br b y c dy) 1) z d (-2), dH) =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1)) 2, dH) =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
−− Delete specific
balance (Br (Br a x b dx) y c (-2), dH) =
(Br a x (Br b y c (-1)) (dx+1), dH)
balance (Br a x (Br b y c dy) 2, dH) =
(Br (Br a x b 1) y c (dy-1), dH)
balance (t, d) = (t, d)

B.3.2 Imperative delete

The imperative delete uses tree rotations for fixing. In the first step, we re-use the binary
search tree algorithm to delete the node x from tree T ; then in the second step, check the
balance factor and perform rotation.
1: function Delete(T, x)
2: if x = NIL then
3: return T
4: p ← Parent(x)
5: if Left(x) = NIL then
6: y ← Right(x)
7: replace x with y
8: else if Right(x) = NIL then
9: y ← Left(x)
10: replace x with y
11: else
12: z ← Min(Right(x))
13: copy data from z to x
14: p ← Parent(z)
15: y ← Right(z)
B.3. DELETE ALGORITHM 487

16: replace z with y

17: return AVL-Delete-Fix(T, p, y)
When delete node x, we record its parent in p. If either sub-tree is empty, we cut
off x, and replace it with the other sub-tree. Otherwise if neither sub-tree is empty, we
locate the minimum element z of the right sub-tree, copy data from z to x, then cut z off.
Finally, we call AVL-Delete-Fix with the root T , the parent p, and the replacement
node y. Let the balance factor of p be δ(p), and it changes to δ(p)′ after delete. There
are three cases:

1. |δ(p)| = 0, |δ(p)′ | = 1. After delete, although a sub-tree height decreases, the parent
still satisfies the AVL rule. The algorithm terminates as the tree is still balanced;

2. |δ(p)| = 1, |δ(p)′ | = 0. Before the delete, the height difference between the two
sub-trees is 1; while after delete, the higher sub-tree shrinks by 1. Both sub-trees
have the same height now. As the result, the height of the parent also decrease by
1. We need continue the bottom-up update along the parent reference to the root;

3. |δ(p)| = 1, |δ(p)′ | = 2. After delete, the tree violates the AVL height rule, we need
rotate the tree to fix it.

For case 3, the implementation is similar to the insert fixing. We need add two
additional sub-cases as shown in figure B.2.
1: function AVL-Delete-Fix(T, p, x)
2: while p 6= NIL do
3: l ← Left(p), r ← Right(p)
4: δ ← δ(p), δ ′ ← δ
5: if x = l then
6: δ′ ← δ′ + 1
7: else
8: δ′ ← δ′ − 1
9: if p is leaf then ▷ l = r = NIL
10: δ′ ← 0
11: if |δ| = 1 ∧ |δ ′ | = 0 then
12: x←p
13: p ← Parent(x)
14: else if |δ| = 0 ∧ |δ ′ | = 1 then
15: return T
16: else if |δ| = 1 ∧ |δ ′ | = 2 then
17: if δ ′ = 2 then
18: if δ(r) = 1 then ▷ Right-right
19: δ(p) ← 0
20: δ(r) ← 0
21: p←r
22: T ← Left-Rotate(T, p)
23: else if δ(r) = −1 then ▷ Right-left
24: δy ← δ( Left(r) )
25: if δy = 1 then
26: δ(p) ← −1
27: else
28: δ(p) ← 0
29: δ( Left(r) ) ← 0
30: if δy = −1 then
488 APPENDIX B. AVL TREE - PROOFS AND THE DELETE ALGORITHM

31: δ(r) ← 1
32: else
33: δ(r) ← 0
34: else ▷ Delete specific right-right
35: δ(p) ← 1
36: δ(r) ← δ(r) − 1
37: T ← Left-Rotate(T, p)
38: break ▷ No furthur height change
′
39: else if δ = −2 then
40: if δ(l) = −1 then ▷ Left-left
41: δ(p) ← 0
42: δ(l) ← 0
43: p←l
44: T ← Right-Rotate(T, p)
45: else if δ(l) = 1 then ▷ Left-right
46: δy ← δ( Right(l) )
47: if δy = −1 then
48: δ(p) ← 1
49: else
50: δ(p) ← 0
51: δ( Right(l) ) ← 0
52: if δy = 1 then
53: δ(l) ← −1
54: else
55: δ(l) ← 0
56: else ▷ Delete specific left-left
57: δ(p) ← −1
58: δ(l) ← δ(l) + 1
59: T ← Right-Rotate(T, p)
60: break ▷ No furthur height change
▷ Height decreases, go on bottom-up updating
61: x←p
62: p ← Parent(x)
63: if p = NIL then ▷ Delete the root
64: return x
65: return T

Exercise B.1
1. Compare the imperative tree fixing for insert and delete, there are similarities.
Develop a common fix function for both insert and delete.

B.4 Example program

The main delete program:
Node del(Node t, Node x) {
if x == null then return t
Node y
var parent = [Link]
if [Link] == null {
y = [Link]([Link])
B.4. EXAMPLE PROGRAM 489

Where replaceWith is defined in the chapter of red-black tree. release(x) re-

leases the memory of a node. Function deleteFix is implemented as below:
Node deleteFix(Node t, Node parent, Node x) {
int d1, d2, dy
Node p, l, r
while parent ̸= null {
d2 = d1 = [Link]
d2 = d2 + if x == [Link] then 1 else -1
if isLeaf(parent) then d2 = 0
[Link] = d2
p = parent
l = [Link]
r = [Link]
if abs(d1) == 1 and abs(d2) == 0 {
x = parent
parent = [Link]
} else if abs(d1) == 0 and abs(d2) == 1 {
return t
} else if abs(d1) == 1 and abs(d2) == 2 {
if d2 == 2 {
if [Link] == 1 { // right-right
[Link] = 0
[Link] = 0
parent = r
t = leftRotate(t, p)
} else if [Link] == -1 { // right-left
dy = [Link]
[Link] = if dy == 1 then -1 else 0
[Link] = 0
[Link] = if dy == -1 then 1 else 0
parent = [Link]
t = rightRotate(t, r)
t = leftRotate(t, p)
} else { // delete specific right-right
[Link] = 1
[Link] = [Link] - 1
t = leftRotate(t, p)
break // no further height change
}
} else if d2 == -2 {
if ([Link] == -1) { // left-left
[Link] = 0
[Link] = 0
parent = l
t = rightRotate(t, p)
} else if [Link] == 1 { // left-right
dy = [Link]
[Link] = if dy == 1 then -1 else 0
[Link] = 0
[Link] = if dy == -1 then 1 else 0
parent = [Link];
490 APPENDIX B. AVL TREE - PROOFS AND THE DELETE ALGORITHM

t = leftRotate(t, l)
t = rightRotate(t, p)
} else { // delete specific left-left
[Link] = -1
[Link] = [Link] + 1
t = rightRotate(t, p)
break // no further height change
}
}
// height decreases, go on bottom-up update
x = parent
parent = [Link]
}
}
if parent == null then return x // delete the root
return t
}
Bibliography

[1] Richard Bird. “Pearls of functional algorithm design”. Cambridge University Press;
1 edition (November 1, 2010). ISBN-10: 0521513383. pp1 - pp6.

[2] Jon Bentley. “Programming Pearls(2nd Edition)”. Addison-Wesley Professional; 2

edition (October 7, 1999). ISBN-13: 978-0201657883.

[3] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502

[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.

[5] Chris Okasaki. “Ten Years of Purely Functional Data

Structures”. [Link]
[Link]

[6] SGI. “Standard Template Library Programmer’s Guide”. [Link]

tech/stl/

[7] Wikipedia. “Fold(high-order function)”. [Link]

Fold_(higher-order_function)

[8] Wikipedia. “Function Composition”. [Link]

Function_composition

[9] Wikipedia. “Partial application”. [Link]

application

[10] Miran Lipovaca. “Learn You a Haskell for Great Good! A Beginner’s Guide”. No
Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327-283-8

[11] Wikipedia. “Bubble sort”. [Link]

[12] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855

[13] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a Functional Setting”.

J. Functional Programming. 1998

[14] Wikipedia. “Red-black tree”. [Link]

tree

[15] Lyn Turbak. “Red-Black Trees”. [Link]

fall01/[Link] Nov. 2, 2001.

491
492 BIBLIOGRAPHY

[16] Rosetta Code. “Pattern matching”. [Link]

Pattern_matching

[17] Hackage. “[Link]”. [Link]

archive/AvlTree/4.2/doc/html/[Link]

[18] Wikipedia. “AVL tree”. [Link]

[19] Guy Cousinear, Michel Mauny. “The Functional Approach to Programming”. Cam-
bridge University Press; English Ed edition (October 29, 1998). ISBN-13: 978-
0521576819

[20] Pavel Grafov. “Implementation of an AVL tree in Python”. [Link]

pgrafov/python-avl-tree

[21] Chris Okasaki and Andrew Gill. “Fast Mergeable Integer Maps”. Workshop on ML,
September 1998, pages 77-86.

[22] D.R. Morrison, “PATRICIA – Practical Algorithm To Retrieve Information Coded

In Alphanumeric”, Journal of the ACM, 15(4), October 1968, pages 514-534.

[23] Wikipedia. “Suﬀix Tree”. [Link]

[24] Wikipedia. “Trie”. [Link]

[25] Wikipedia. “T9 (predictive text)”. [Link]

(predictive_text)

[26] Wikipedia. “Predictive text”. [Link]

Predictive_text

[27] Esko Ukkonen. “On-line construction of suﬀix trees”. Algorithmica 14 (3): 249–
260. doi:10.1007/BF01206331. [Link]
[Link]

[28] Weiner, P. “Linear pattern matching algorithms”, 14th Annual IEEE Symposium on
Switching and Automata Theory, pp. 1-11, doi:10.1109/SWAT.1973.13

[29] Esko Ukkonen. “Suﬀix tree and suﬀix array techniques for pattern analysis in strings”.
[Link]

[30] Suﬀix Tree (Java). [Link]

(Java)

[31] Robert Giegerich and Stefan Kurtz. “From Ukkonen to McCreight and Weiner:
A Unifying View of Linear-Time Suﬀix Tree Construction”. Science of Com-
puter Programming 25(2-3):187-218, 1995. [Link]
[Link]

[32] Robert Giegerich and Stefan Kurtz. “A Comparison of Imperative and

Purely Functional Suﬀix Tree Constructions”. Algorithmica 19 (3): 331–353.
doi:10.1007/PL00009177. [Link]/pubs/pdf/[Link]

[33] Bryan O’Sullivan. “suffixtree: Efficient, lazy suffix tree implementation”. http://
[Link]/package/suffixtree

[34] Danny. [Link]

BIBLIOGRAPHY 493

[35] Dan Gusfield. “Algorithms on Strings, Trees and Sequences Computer Science and
Computational Biology”. Cambridge University Press; 1 edition (May 28, 1997)
ISBN: 9780521585194

[36] Lloyd Allison. “Suﬀix Trees”. [Link]

Suffix/

[37] Esko Ukkonen. “Suﬀix tree and suﬀix array techniques for pattern analysis in strings”.
[Link]

[38] Esko Ukkonen “Approximate string-matching over suﬀix trees”. Proc. CPM 93. Lec-
ture Notes in Computer Science 684, pp. 228-242, Springer 1993. [Link]
[Link]/u/ukkonen/[Link]

[39] Wikipeida. “B-tree”. [Link]

[40] Wikipedia. “Heap (data structure)”. [Link]

(data_structure)

[41] Wikipedia. “Heapsort”. [Link]

[42] Rosetta Code. “Sorting algorithms/Heapsort”. [Link]

wiki/Sorting_algorithms/Heapsort

[43] Wikipedia. “Leftist Tree”. [Link]

[44] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented Design Pat-
terns in Java. [Link]

[45] Donald E. Knuth. “The Art of Computer Programming. Volume 3: Sorting and
Searching.”. Addison-Wesley Professional; 2nd Edition (October 15, 1998). ISBN-13:
978-0201485417. Section 5.2.3 and 6.2.3

[46] Wikipedia. “Skew heap”. [Link]

[47] Sleator, Daniel Dominic; Jarjan, Robert Endre. “Self-adjusting heaps” SIAM Journal
on Computing 15(1):52-69. doi:10.1137/0215004 ISSN 00975397 (1986)

[48] Wikipedia. “Splay tree”. [Link]

[49] Sleator, Daniel D.; Tarjan, Robert E. (1985), “Self-Adjusting Binary Search Trees”,
Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835

[50] NIST, “binary heap”. [Link]

html

[51] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855

[52] Wikipedia. “Strict weak order”. [Link]

weak_order

[53] Wikipedia. “FIFA world cup”. [Link]

World_Cup

[54] Wikipedia. “K-ary tree”. [Link]

494 BIBLIOGRAPHY

[55] Wikipedia, “Pascal’s triangle”. [Link]

triangle

[56] Hackage. “An alternate implementation of a priority queue based on a Fibonacci

heap.”, [Link]
1.0.7/doc/html/src/[Link]

[57] Chris Okasaki. “Fibonacci Heaps.” [Link]

fibheaps/orig

[58] Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan.
“The Pairing Heap: A New Form of Self-Adjusting Heap” Algorithmica (1986) 1:
111-129.

[59] Maged M. Michael and Michael L. Scott. “Simple, Fast, and Practical Non-Blocking
and Blocking Concurrent Queue Algorithms”. [Link]
research/synchronization/pseudocode/[Link]

[60] Herb Sutter. “Writing a Generalized Concurrent Queue”. Dr. Dobb’s Oct 29, 2008.
[Link]

[61] Wikipedia. “Tail-call”. [Link]

[62] Wikipedia. “Recursion (computer science)”. [Link]

Recursion_(computer_science)#Tail-recursive_functions

[63] Harold Abelson, Gerald Jay Sussman, Julie Sussman. “Structure and Interpretation
of Computer Programs, 2nd Edition”. MIT Press, 1996, ISBN 0-262-51087-1.

[64] Chris Okasaki. “Purely Functional Random-Access Lists”. Functional Programming

Languages and Computer Architecture, June 1995, pages 86-95.

[65] Ralf Hinze and Ross Paterson. “Finger Trees: A Simple General-purpose Data
Structure.” in Journal of Functional Programming16:2 (2006), pages 197-217. http:
//[Link]/~ross/papers/[Link]

[66] Guibas, L. J., McCreight, E. M., Plass, M. F., Roberts, J. R. (1977), ”A new repre-
sentation for linear lists”. Conference Record of the Ninth Annual ACM Symposium
on Theory of Computing, pp. 49-60.

[67] Generic finger-tree structure. [Link]

archive/fingertree/0.0/doc/html/[Link]

[68] Wikipedia. “Move-to-front transform”. [Link]

Move-to-front_transform

[69] Robert Sedgewick. “Implementing quick sort programs”. Communication of ACM.

Volume 21, Number 10. 1978. pp.847 - 857.

[70] Jon Bentley, Douglas McIlroy. “Engineering a sort function”. Software Practice and
experience VOL. 23(11), 1249-1265 1993.

[71] Robert Sedgewick, Jon Bentley. “Quicksort is optimal”. [Link]

[Link]/~rs/talks/[Link]

[72] Fethi Rabhi, Guy Lapalme. “Algorithms: a functional programming approach”. Sec-
ond edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
BIBLIOGRAPHY 495

[73] Simon Peyton Jones. “The Implementation of functional programming languages”.

Prentice-Hall International, 1987. ISBN: 0-13-453333-X
[74] Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. “Practical in-place mergesort”.
Nordic Journal of Computing, 1996.
[75] Josè Bacelar Almeida and Jorge Sousa Pinto. “Deriving Sorting Algorithms”. Tech-
nical report, Data structures and Algorithms. 2008.
[76] Cole, Richard (August 1988). “Parallel merge sort”. SIAM J. Comput. 17 (4): 770-
785. doi:10.1137/0217049. (August 1988)
[77] Powers, David M. W. “Parallelized Quicksort and Radixsort with Optimal Speedup”,
Proceedings of International Conference on Parallel Computing Technologies. Novosi-
birsk. 1991.
[78] Wikipedia. “Quicksort”. [Link]
[79] Wikipedia. “Total order”. [Link]
[80] Wikipedia. “Harmonic series (mathematics)”. [Link]
wiki/Harmonic_series_(mathematics)
[81] M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan, ”Time bounds for selec-
tion,” J. Comput. System Sci. 7 (1973) 448-461.
[82] Edsger W. Dijkstra. “The saddleback search”. EWD-934. 1985. [Link]
[Link]/users/EWD/[Link].
[83] Robert Boyer, and Strother Moore. “MJRTY - A Fast Majority Vote Algorithm”.
Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning
Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991, pp. 105-117.
[84] Cormode, Graham; S. Muthukrishnan (2004). “An Improved Data Stream Summary:
The Count-Min Sketch and its Applications”. J. Algorithms 55: 29-38.
[85] Knuth Donald, Morris James H., jr, Pratt Vaughan. “Fast pattern matching in
strings”. SIAM Journal on Computing 6 (2): 323-350. 1977.
[86] Robert Boyer, Strother Moore. “A Fast String Searching Algorithm”. Comm. ACM
(New York, NY, USA: Association for Computing Machinery) 20 (10): 762-772. 1977
[87] R. N. Horspool. “Practical fast searching in strings”. Software - Practice & Experience
10 (6): 501-506. 1980.
[88] Wikipedia. “Boyer-Moore string search algorithm”. [Link]
wiki/Boyer-Moore_string_search_algorithm
[89] Wikipedia. “Eight queens puzzle”. [Link]
queens_puzzle
[90] George Pólya. “How to solve it: A new aspect of mathematical method”. Princeton
University Press(April 25, 2004). ISBN-13: 978-0691119663
[91] Wikipedia. “David A. Huffman”. [Link]
_Huffman
[92] Andrei Alexandrescu. “Modern C++ design: Generic Programming and Design Pat-
terns Applied”. Addison Wesley February 01, 2001, ISBN 0-201-70431-5
496 BIBLIOGRAPHY

[93] Benjamin C. Pierce. “Types and Programming Languages”. The MIT Press, 2002.
ISBN:0262162091

[94] Joe Armstrong. “Programming Erlang: Software for a Concurrent World”. Pragmatic
Bookshelf; 1 edition (July 18, 2007). ISBN-13: 978-1934356005
[95] SGI. “transform”. [Link]

[96] ACM/ICPC. “The drunk jailer.” Peking University judge online for ACM/ICPC.
[Link]
[97] Haskell wiki. “Haskell programming tips”. 4.4 Choose the appropriate fold. http:
//[Link]/haskellwiki/Haskell_programming_tips
[98] Wikipedia. “Dot product”. [Link]

[99] Xinyu LIU. “Isomorphism - mathematics of programming”. [Link]

com/liuxinyu95/unplugged
GNU Free Documentation License

Version 1.3, 3 November 2008

<[Link]

Everyone is permitted to copy and distribute verbatim copies of this license document,
but changing it is not allowed.

Preamble
The purpose of this License is to make a manual, textbook, or other functional and
useful document “free” in the sense of freedom: to assure everyone the effective freedom
to copy and redistribute it, with or without modifying it, either commercially or noncom-
mercially. Secondarily, this License preserves for the author and publisher a way to get
credit for their work, while not being considered responsible for modifications made by
others.
This License is a kind of “copyleft”, which means that derivative works of the document
must themselves be free in the same sense. It complements the GNU General Public
License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, be-
cause free software needs free documentation: a free program should come with manuals
providing the same freedoms that the software does. But this License is not limited to
software manuals; it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book. We recommend this License principally for
works whose purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that contains a
notice placed by the copyright holder saying it can be distributed under the terms of this
License. Such a notice grants a world-wide, royalty-free license, unlimited in duration,
to use that work under the conditions stated herein. The “Document”, below, refers
to any such manual or work. Any member of the public is a licensee, and is addressed
as “you”. You accept the license if you copy, modify or distribute the work in a way
requiring permission under copyright law.
A “Modified Version” of the Document means any work containing the Document
or a portion of it, either copied verbatim, or with modifications and/or translated into
another language.
A “Secondary Section” is a named appendix or a front-matter section of the Doc-
ument that deals exclusively with the relationship of the publishers or authors of the
Document to the Document’s overall subject (or to related matters) and contains nothing
that could fall directly within that overall subject. (Thus, if the Document is in part a

497
498 BIBLIOGRAPHY

textbook of mathematics, a Secondary Section may not explain any mathematics.) The
relationship could be a matter of historical connection with the subject or with related
matters, or of legal, commercial, philosophical, ethical or political position regarding
them.
The “Invariant Sections” are certain Secondary Sections whose titles are designated,
as being those of Invariant Sections, in the notice that says that the Document is released
under this License. If a section does not fit the above definition of Secondary then it is
not allowed to be designated as Invariant. The Document may contain zero Invariant
Sections. If the Document does not identify any Invariant Sections then there are none.
The “Cover Texts” are certain short passages of text that are listed, as Front-Cover
Texts or Back-Cover Texts, in the notice that says that the Document is released under
this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may
be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy, represented
in a format whose specification is available to the general public, that is suitable for re-
vising the document straightforwardly with generic text editors or (for images composed
of pixels) generic paint programs or (for drawings) some widely available drawing editor,
and that is suitable for input to text formatters or for automatic translation to a variety of
formats suitable for input to text formatters. A copy made in an otherwise Transparent
file format whose markup, or absence of markup, has been arranged to thwart or dis-
courage subsequent modification by readers is not Transparent. An image format is not
Transparent if used for any substantial amount of text. A copy that is not “Transparent”
is called “Opaque”.
Examples of suitable formats for Transparent copies include plain ASCII without
markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly
available DTD, and standard-conforming simple HTML, PostScript or PDF designed
for human modification. Examples of transparent image formats include PNG, XCF
and JPG. Opaque formats include proprietary formats that can be read and edited only
by proprietary word processors, SGML or XML for which the DTD and/or processing
tools are not generally available, and the machine-generated HTML, PostScript or PDF
produced by some word processors for output purposes only.
The “Title Page” means, for a printed book, the title page itself, plus such following
pages as are needed to hold, legibly, the material this License requires to appear in the
title page. For works in formats which do not have any title page as such, “Title Page”
means the text near the most prominent appearance of the work’s title, preceding the
beginning of the body of the text.
The “publisher” means any person or entity that distributes copies of the Document
to the public.
A section “Entitled XYZ” means a named subunit of the Document whose title
either is precisely XYZ or contains XYZ in parentheses following text that translates
XYZ in another language. (Here XYZ stands for a specific section name mentioned below,
such as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To
“Preserve the Title” of such a section when you modify the Document means that it
remains a section “Entitled XYZ” according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that
this License applies to the Document. These Warranty Disclaimers are considered to be
included by reference in this License, but only as regards disclaiming warranties: any
other implication that these Warranty Disclaimers may have is void and has no effect on
the meaning of this License.

2. VERBATIM COPYING
BIBLIOGRAPHY 499

You may copy and distribute the Document in any medium, either commercially or
noncommercially, provided that this License, the copyright notices, and the license notice
saying this License applies to the Document are reproduced in all copies, and that you
add no other conditions whatsoever to those of this License. You may not use technical
measures to obstruct or control the reading or further copying of the copies you make
or distribute. However, you may accept compensation in exchange for copies. If you
distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may
publicly display copies.

3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers)
of the Document, numbering more than 100, and the Document’s license notice requires
Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all
these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the
back cover. Both covers must also clearly and legibly identify you as the publisher of
these copies. The front cover must present the full title with all words of the title equally
prominent and visible. You may add other material on the covers in addition. Copying
with changes limited to the covers, as long as they preserve the title of the Document and
satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put
the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest
onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100,
you must either include a machine-readable Transparent copy along with each Opaque
copy, or state in or with each Opaque copy a computer-network location from which
the general network-using public has access to download using public-standard network
protocols a complete Transparent copy of the Document, free of added material. If you use
the latter option, you must take reasonably prudent steps, when you begin distribution of
Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible
at the stated location until at least one year after the last time you distribute an Opaque
copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well
before redistributing any large number of copies, to give them a chance to provide you
with an updated version of the Document.

4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions
of sections 2 and 3 above, provided that you release the Modified Version under precisely
this License, with the Modified Version filling the role of the Document, thus licensing
distribution and modification of the Modified Version to whoever possesses a copy of it.
In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the
Document, and from those of previous versions (which should, if there were any, be
listed in the History section of the Document). You may use the same title as a
previous version if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for
authorship of the modifications in the Modified Version, together with at least five
of the principal authors of the Document (all of its principal authors, if it has fewer
than five), unless they release you from this requirement.
500 BIBLIOGRAPHY

C. State on the Title page the name of the publisher of the Modified Version, as the
publisher.

D. Preserve all the copyright notices of the Document.

E. Add an appropriate copyright notice for your modifications adjacent to the other
copyright notices.

F. Include, immediately after the copyright notices, a license notice giving the public
permission to use the Modified Version under the terms of this License, in the form
shown in the Addendum below.

G. Preserve in that license notice the full lists of Invariant Sections and required Cover
Texts given in the Document’s license notice.

H. Include an unaltered copy of this License.

I. Preserve the section Entitled “History”, Preserve its Title, and add to it an item
stating at least the title, year, new authors, and publisher of the Modified Version as
given on the Title Page. If there is no section Entitled “History” in the Document,
create one stating the title, year, authors, and publisher of the Document as given
on its Title Page, then add an item describing the Modified Version as stated in the
previous sentence.

J. Preserve the network location, if any, given in the Document for public access to
a Transparent copy of the Document, and likewise the network locations given in
the Document for previous versions it was based on. These may be placed in the
“History” section. You may omit a network location for a work that was published
at least four years before the Document itself, or if the original publisher of the
version it refers to gives permission.

K. For any section Entitled “Acknowledgements” or “Dedications”, Preserve the Title

of the section, and preserve in the section all the substance and tone of each of the
contributor acknowledgements and/or dedications given therein.

L. Preserve all the Invariant Sections of the Document, unaltered in their text and in
their titles. Section numbers or the equivalent are not considered part of the section
titles.

M. Delete any section Entitled “Endorsements”. Such a section may not be included in
the Modified Version.

N. Do not retitle any existing section to be Entitled “Endorsements” or to conflict in

title with any Invariant Section.

O. Preserve any Warranty Disclaimers.

If the Modified Version includes new front-matter sections or appendices that qualify
as Secondary Sections and contain no material copied from the Document, you may at
your option designate some or all of these sections as invariant. To do this, add their
titles to the list of Invariant Sections in the Modified Version’s license notice. These titles
must be distinct from any other section titles.
You may add a section Entitled “Endorsements”, provided it contains nothing but
endorsements of your Modified Version by various parties—for example, statements of
peer review or that the text has been approved by an organization as the authoritative
definition of a standard.
BIBLIOGRAPHY 501

You may add a passage of up to five words as a Front-Cover Text, and a passage of up
to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified
Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added
by (or through arrangements made by) any one entity. If the Document already includes
a cover text for the same cover, previously added by you or by arrangement made by the
same entity you are acting on behalf of, you may not add another; but you may replace
the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission
to use their names for publicity for or to assert or imply endorsement of any Modified
Version.

5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License,
under the terms defined in section 4 above for modified versions, provided that you in-
clude in the combination all of the Invariant Sections of all of the original documents,
unmodified, and list them all as Invariant Sections of your combined work in its license
notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical
Invariant Sections may be replaced with a single copy. If there are multiple Invariant
Sections with the same name but different contents, make the title of each such section
unique by adding at the end of it, in parentheses, the name of the original author or
publisher of that section if known, or else a unique number. Make the same adjustment
to the section titles in the list of Invariant Sections in the license notice of the combined
work.
In the combination, you must combine any sections Entitled “History” in the various
original documents, forming one section Entitled “History”; likewise combine any sections
Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You must delete
all sections Entitled “Endorsements”.

6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released
under this License, and replace the individual copies of this License in the various docu-
ments with a single copy that is included in the collection, provided that you follow the
rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individ-
ually under this License, provided you insert a copy of this License into the extracted
document, and follow this License in all other respects regarding verbatim copying of
that document.

7. AGGREGATION WITH INDEPENDENT

WORKS
A compilation of the Document or its derivatives with other separate and independent
documents or works, in or on a volume of a storage or distribution medium, is called an
“aggregate” if the copyright resulting from the compilation is not used to limit the legal
rights of the compilation’s users beyond what the individual works permit. When the
Document is included in an aggregate, this License does not apply to the other works in
the aggregate which are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Docu-
ment, then if the Document is less than one half of the entire aggregate, the Document’s
502 BIBLIOGRAPHY

Cover Texts may be placed on covers that bracket the Document within the aggregate, or
the electronic equivalent of covers if the Document is in electronic form. Otherwise they
must appear on printed covers that bracket the whole aggregate.

8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of
the Document under the terms of section 4. Replacing Invariant Sections with translations
requires special permission from their copyright holders, but you may include translations
of some or all Invariant Sections in addition to the original versions of these Invariant
Sections. You may include a translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include the original
English version of this License and the original versions of those notices and disclaimers.
In case of a disagreement between the translation and the original version of this License
or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled “Acknowledgements”, “Dedications”, or “His-
tory”, the requirement (section 4) to Preserve its Title (section 1) will typically require
changing the actual title.

9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly
provided under this License. Any attempt otherwise to copy, modify, sublicense, or dis-
tribute it is void, and will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your license from a particular
copyright holder is reinstated (a) provisionally, unless and until the copyright holder
explicitly and finally terminates your license, and (b) permanently, if the copyright holder
fails to notify you of the violation by some reasonable means prior to 60 days after the
cessation.
Moreover, your license from a particular copyright holder is reinstated permanently if
the copyright holder notifies you of the violation by some reasonable means, this is the
first time you have received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after your receipt of the
notice.
Termination of your rights under this section does not terminate the licenses of parties
who have received copies or rights from you under this License. If your rights have been
terminated and not permanently reinstated, receipt of a copy of some or all of the same
material does not give you any rights to use it.

10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the GNU Free
Documentation License from time to time. Such new versions will be similar in spirit to
the present version, but may differ in detail to address new problems or concerns. See
[Link]
Each version of the License is given a distinguishing version number. If the Document
specifies that a particular numbered version of this License “or any later version” applies
to it, you have the option of following the terms and conditions either of that specified
version or of any later version that has been published (not as a draft) by the Free Software
Foundation. If the Document does not specify a version number of this License, you may
choose any version ever published (not as a draft) by the Free Software Foundation. If
the Document specifies that a proxy can decide which future versions of this License can
BIBLIOGRAPHY 503

be used, that proxy’s public statement of acceptance of a version permanently authorizes

you to choose that version for the Document.

11. RELICENSING
“Massive Multiauthor Collaboration Site” (or “MMC Site”) means any World Wide
Web server that publishes copyrightable works and also provides prominent facilities for
anybody to edit those works. A public wiki that anybody can edit is an example of such a
server. A “Massive Multiauthor Collaboration” (or “MMC”) contained in the site means
any set of copyrightable works thus published on the MMC site.
“CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0 license pub-
lished by Creative Commons Corporation, a not-for-profit corporation with a principal
place of business in San Francisco, California, as well as future copyleft versions of that
license published by that same organization.
“Incorporate” means to publish or republish a Document, in whole or in part, as part
of another Document.
An MMC is “eligible for relicensing” if it is licensed under this License, and if all
works that were first published under this License somewhere other than this MMC, and
subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or
invariant sections, and (2) were thus incorporated prior to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site under
CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is
eligible for relicensing.

ADDENDUM: How to use this License for your

documents
To use this License in a document you have written, include a copy of the License in
the document and put the following copyright and license notices just after the title page:

and/or modify this document under the terms of the GNU Free Documenta-
tion License, Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-
Cover Texts. A copy of the license is included in the section entitled “GNU
Free Documentation License”.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the
“with … Texts.” line with this:

with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover
Texts being LIST, and with the Back-Cover Texts being LIST.

If you have Invariant Sections without Cover Texts, or some other combination of the
three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend re-
leasing these examples in parallel under your choice of free software license, such as the
GNU General Public License, to permit their use in free software.
Index

8 queens puzzle, 416 Binomial Heap

Linking, 207
Auto completion, 117 Binomial heap, 203
AVL tree, 91 definition, 204
balance, 94 insertion, 209
definition, 91 pop, 214
imperative insert, 96 Binomial tree, 203
insert, 93 merge, 211
verification, 95 Boyer-Moor majority number, 386
Boyer-Moore algorithm, 402
B-tree, 127 Breadth-first search, 442
delete, 135
insert, 129 Change-making problem, 453
search, 148 Cock-tail sort, 188
split, 130 Curried Form, 31
BFS, 442 Currying, 31
Binary heap, 153
Deep-first search, 411
build heap, 157
DFS, 411
decrease key, 162
Dynamic programming, 455
heap push, 164
Heapify, 154 Fibonacci Heap, 217
insertion, 164 decrease key, 228
merge, 169 delete min, 220
pop, 161 insert, 218
top, 161 merge, 219
top-k, 162 pop, 220
Binary Random Access List Finger Tree
Definition, 268 Imperative splitting, 312
Insertion, 269 Finger tree
Random access, 272 Append to tail, 297
Remove from head, 271 Concatenate, 300
Binary search, 371 Definition, 287
binary search tree, 55 Ill-formed tree, 293
data layout, 56 Imperative random access, 310
delete, 63 Insert to head, 290
insertion, 56 Random access, 304, 310
looking up, 60 Remove from head, 292
min/max, 61 Remove from tail, 299
random build, 66 Size augmentation, 304
search, 60 splitting, 308
succ/pred, 61 fold, 44
traverse, 59
binary tree, 55 Grady algorithm, 444

504
INDEX 505

Heap sort, 164 fold from left, 46

Huffman coding, 444 fold from right, 44
foldl, 46
Implicit binary heap, 153 foldr, 44
in-order traverse, 59 for each, 37
Insertion sort get at, 22
binary search, 71 group, 42
binary search tree, 72 head, 22
linked-list setting, 71 index, 22
insertion sort, 69 infix, 50
insertion, 70 init, 23
Integer Patricia, 104 insert, 27
Integer prefix tree, 104 insert at, 27
Integer tree last, 23
insert, 105 length, 22
lookup, 109 lookup, 49
Integer trie, 101 map, 35, 36
insert, 102 matching, 50
look up, 104 maximum, 33
Kloski puzzle, 436 minimum, 33
KMP, 391 mutate, 25
Knuth-Morris-Pratt algorithm, 391 prefix, 50
product, 30
LCS, 459 reverse, 39
left child, right sibling, 206 Reverse index, 24
Leftist heap, 167 rindex, 24
heap sort, 170 set at, 26
insertion, 169 span, 41
merge, 168 split at, 40
pop, 169 suﬀix, 50
rank, 167 sum, 30
S-value, 167 tail, 22
top, 169 take, 40
List take while, 41
append, 25 Transform, 34
break, 41 unzip, 51
concat, 29 zip, 51
concats, 48 Longest common subsequence problem, 459
cons, 22
Construction, 22 Maximum sum problem, 390
definition, 21 Maze problem, 411
delete, 28 Merge Sort, 341
delete at, 28 Basic version, 341
drop, 40 Bottom-up merge sort, 360
drop while, 41 In-place merge sort, 348
elem, 48 In-place working area, 349
empty, 22 Linked-list merge sort, 353
empty testing, 22 Merge, 342
existence testing, 48 Naive in-place merge, 348
Extract sub-list, 40 Nature merge sort, 355
filter, 49 Performance analysis, 344
find, 49 Work area allocation, 345
506 INDEX

minimum free number, 11 Radix tree, 101

MTF, 315 range traverse, 63
Red-black tree
Paired-array list Imperative delete, 473
Definition, 280 red-black tree, 75, 78
Insertion and appending, 281 delete, 82
Random access, 282 imperative insertion, 86
Removing and balancing, 282 insert, 80
Pairing heap, 232 red-black properties, 78
decrease key, 235 reduce, 46
definition, 233
delete, 238 Saddelback search, 375
delete min, 235 Selection algorithm, 368
find min, 233 selection sort, 181
insert, 233 minimum finding, 183
pop, 235 parameterize the comparator, 186
top, 233 tail-recursive call minimum finding, 184
Parallel merge sort, 362 Sequence
Parallel quick sort, 362 Binary random access list, 268
Patricia, 113 Concatenate-able list, 284
Peg puzzle, 419 finger tree, 287
post-order traverse, 59 Imperative binary random access list,
pre-order traverse, 59 277
Prefix tree, 113 numeric representation for binary ran-
insert, 113 dom access list, 275
look up, 116 Paired-array list, 280
Skew heap, 170
Queue insertion, 171
Balance Queue, 253 merge, 171
Circular buffer, 247 pop, 171
Incremental concatenate, 256 top, 171
Incremental reverse, 255 Splay heap, 172
Lazy real-time queue, 261 insertion, 177
Paired-array queue, 251 merge, 177
Paired-list queue, 249 pop, 177
Real-time Queue, 254 splaying, 172
Singly linked-list, 243 top, 177
Quick Sort Subset sum problem, 464
2-way partition, 331
3-way partition, 333 T9, 119
Accmulated partition, 325 Tail call, 31
Accumulated quick sort, 325 Tail recursion, 31
Average case analysis, 327 Tail recursive call, 31
Basic version, 320 The wolf, goat, and cabbage puzzle, 424
Engineering improvement, 330 Tounament knock out
Handle duplicated elements, 330 explicit infinity, 196
Insertion sort fall-back, 340 tree reconstruction, 60
One pass functional partition, 324 tree rotation, 76
Performance analysis, 326 Trie, 110
Strict weak ordering, 321 insert, 110
Quick sort, 319 look up, 112
partition, 321 Trounament knock out, 192
INDEX 507

Water jugs puzzle, 429

word counter, 55

Avl-2 0 1
No ratings yet
Avl-2 0 1
432 pages
Dsa - Barnette and Tonga - 2
No ratings yet
Dsa - Barnette and Tonga - 2
3 pages
Data Sructures and Algorithms
No ratings yet
Data Sructures and Algorithms
112 pages
Data Structures and Algorithms
100% (2)
Data Structures and Algorithms
111 pages
To Print - Dsa Dotnet Slackers
No ratings yet
To Print - Dsa Dotnet Slackers
56 pages
Data Structures
No ratings yet
Data Structures
104 pages
Algo Quicksheet
No ratings yet
Algo Quicksheet
70 pages
STL Documentation
No ratings yet
STL Documentation
108 pages
STL Free
No ratings yet
STL Free
181 pages
Neo4j Cypher Manual 4.2
100% (1)
Neo4j Cypher Manual 4.2
673 pages
PureScript Tutorial for Web Developers
No ratings yet
PureScript Tutorial for Web Developers
235 pages
Data Structures
No ratings yet
Data Structures
104 pages
JNTU BTECH 2-1 Data Structures NOTES
No ratings yet
JNTU BTECH 2-1 Data Structures NOTES
104 pages
Pure Script Book
100% (1)
Pure Script Book
232 pages
Data Structures and Algorithms Guide
No ratings yet
Data Structures and Algorithms Guide
104 pages
Data Structures C1
No ratings yet
Data Structures C1
11 pages
Notes
No ratings yet
Notes
129 pages
Transforming Postfix to Binary Tree
No ratings yet
Transforming Postfix to Binary Tree
215 pages
Data-Structures in Java
No ratings yet
Data-Structures in Java
233 pages
Data Structure PDF
No ratings yet
Data Structure PDF
233 pages
CS Handbook for Students & Grads
100% (2)
CS Handbook for Students & Grads
271 pages
Rainer Grimm - The C++ Standard Library (2d Ed., Including C++ 17) - Leanpub (2018)
No ratings yet
Rainer Grimm - The C++ Standard Library (2d Ed., Including C++ 17) - Leanpub (2018)
251 pages
Johan Sannemo - Principles of Algorithmic Problem Solving
No ratings yet
Johan Sannemo - Principles of Algorithmic Problem Solving
351 pages
Principles of Algorithmic Problem Solving PDF
100% (1)
Principles of Algorithmic Problem Solving PDF
351 pages
Java Data Structures Hilfinger
No ratings yet
Java Data Structures Hilfinger
231 pages
An Intuitive Introduction To Data Structures Heinold
No ratings yet
An Intuitive Introduction To Data Structures Heinold
167 pages
Elementary Algorithms
100% (1)
Elementary Algorithms
622 pages
Elementary Algorithms
No ratings yet
Elementary Algorithms
622 pages
Main
No ratings yet
Main
321 pages
AlgoXY Elementary Algorithms
No ratings yet
AlgoXY Elementary Algorithms
749 pages
cmpt141 Readings
No ratings yet
cmpt141 Readings
166 pages
Rainer Grimm The C Standard Library 3rd Edition C 20
No ratings yet
Rainer Grimm The C Standard Library 3rd Edition C 20
307 pages
Elementary Algorithms
100% (4)
Elementary Algorithms
630 pages
Data Structures and Algorithms Annotated Reference With Examples Granville Barnett Instant Download
No ratings yet
Data Structures and Algorithms Annotated Reference With Examples Granville Barnett Instant Download
49 pages
R Language Definition: R Development Core Team
No ratings yet
R Language Definition: R Development Core Team
67 pages
Data Structure
No ratings yet
Data Structure
251 pages
Elementary Algorithms
100% (1)
Elementary Algorithms
618 pages
Perl 5 Tutorial
100% (18)
Perl 5 Tutorial
241 pages
Perl by Example
100% (1)
Perl by Example
229 pages
Guide
No ratings yet
Guide
160 pages
Prog 2021
No ratings yet
Prog 2021
195 pages
Hilfinger Data Structures
No ratings yet
Hilfinger Data Structures
253 pages
EYFS Language Development Goals
No ratings yet
EYFS Language Development Goals
3 pages
The Fourth Phase of Water Explained
No ratings yet
The Fourth Phase of Water Explained
1 page
Firearms Inspection and Troubleshooting
100% (3)
Firearms Inspection and Troubleshooting
188 pages
LDH Arc Chem
No ratings yet
LDH Arc Chem
8 pages
in Voice No 2011808679
No ratings yet
in Voice No 2011808679
2 pages
Handbook of Forensic Mental Health (PDFDrive)
No ratings yet
Handbook of Forensic Mental Health (PDFDrive)
658 pages
E Nihss
No ratings yet
E Nihss
5 pages
Full Essentials of Food Science 5th Edition Vickie Vaclavik PDF All Chapters
No ratings yet
Full Essentials of Food Science 5th Edition Vickie Vaclavik PDF All Chapters
55 pages
Case Dozer Fuel Consumption
100% (2)
Case Dozer Fuel Consumption
1 page
Engineering Drawing for Teck Project
No ratings yet
Engineering Drawing for Teck Project
1 page
Calitatea Apei
No ratings yet
Calitatea Apei
64 pages
Foundations of Economics 7th Edition Bade Get PDF Now
No ratings yet
Foundations of Economics 7th Edition Bade Get PDF Now
301 pages
Psychosocial Treatments - 1st Edition Latest Edition Download
100% (17)
Psychosocial Treatments - 1st Edition Latest Edition Download
15 pages
Exam Registration Tutorial PDF
No ratings yet
Exam Registration Tutorial PDF
9 pages
Welland Water HR Strategy Analysis
No ratings yet
Welland Water HR Strategy Analysis
2 pages
SSC JE Physics Chapter1 Motion Notes Enhanced
No ratings yet
SSC JE Physics Chapter1 Motion Notes Enhanced
3 pages
CNS Unit 2
No ratings yet
CNS Unit 2
38 pages
M Seal Gp. MSDS PDF
No ratings yet
M Seal Gp. MSDS PDF
3 pages
6 Pillars of Intimacy Ebook July 2021
No ratings yet
6 Pillars of Intimacy Ebook July 2021
22 pages
Pre-Analysis Checks: Etabs Modelling Checklist Remarks
No ratings yet
Pre-Analysis Checks: Etabs Modelling Checklist Remarks
5 pages
Reading Passage 1: Succeed in IELTS Volume 12
No ratings yet
Reading Passage 1: Succeed in IELTS Volume 12
15 pages
6.electromagnetic Induction
No ratings yet
6.electromagnetic Induction
13 pages
Grade 10 Quadratic Relations Guide
No ratings yet
Grade 10 Quadratic Relations Guide
8 pages
Studying The Brand Equity of GNRC Hospitals Among The People of Guwahati and Developing Communication Strategy For The Rejuvenation of The Brand GNRC
No ratings yet
Studying The Brand Equity of GNRC Hospitals Among The People of Guwahati and Developing Communication Strategy For The Rejuvenation of The Brand GNRC
16 pages
Reading I
No ratings yet
Reading I
10 pages
Fiitjee: Physics, Chemistry & Mathematics
No ratings yet
Fiitjee: Physics, Chemistry & Mathematics
23 pages
English Language Evolution in China
No ratings yet
English Language Evolution in China
22 pages
AV Equipment Rental Price List
No ratings yet
AV Equipment Rental Price List
5 pages
Ready Mix Plaster
No ratings yet
Ready Mix Plaster
4 pages
Practical Research I Lesson Plan
No ratings yet
Practical Research I Lesson Plan
2 pages

Algoxy en

Uploaded by

Algoxy en

Uploaded by

Elementary Algorithms

July 22, 2021

0.1 The smallest free number . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.6.1 fold right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2 Binary Search Tree 55

6 Radix tree 101

8 Binary Heaps 153

8.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

10 Binomial heap, Fibonacci heap, and pairing heap 203

Insert a new element to the heap . . . . . . . . . . . . . . . . . . . 218

11 Queue, not so simple as it was thought 243

12 Sequences, The last brick 267

12.6.5 append element to the tail of the sequence . . . . . . . . . . . . . . 297

13 Divide and conquer, Quick sort vs. Merge sort 319

Saddleback search . . . . . . . . . . . . . . . . . . . . . . . 375

A Imperative delete for red-black tree 473

B AVL tree - proofs and the delete algorithm 481

GNU Free Documentation License 497

Programmers learn elementary algorithms at school. Except for programming contest,

0.1 The smallest free number

[18, 4, 8, 9, 16, 1, 14, 7, 19, 3, 0, 5, 2, 11, 6]

minfree(x1 , x2 , ..., xn ) ≤ n (1)

A better solution is to use an array of n + 1 flags to mark whether a number in range

void setbit(unsigned int* bits, unsigned int i) {

int testbit(unsigned int* bits, unsigned int i) {

unsigned int bits[N / WORD_LENGTH + 1];

int minfree(int* xs, int n) {

0.1.2 Divide and Conquer

minf ree(A) = search(A, 0, |A| − 1)

0.1.3 Expressiveness and performance

A[i]<=m A[i]>m ...?...

0.2 Regular number

0.2.1 The brute-force solution

1*2=2 1*3=3 1*5=5 2*2=4 2*3=6 2*5=10 3*2=6 3*3=9 3*5=15

4*2=8 4*3=12 4*5=20

(d) Add 9 and 15 back, 6 is

Figure 2: First 4 steps to generate regular numbers.

We can design the algorithm based on this idea:

12: function Unique-Enqueue(Q, x)

Figure 3: Queue access count - n.

The corresponding C implementation takes 0.016s to output 860934420. It is about

X = 1 : [2x|∀x ∈ X] ∪ [3x|∀x ∈ X] ∪ [5x|∀x ∈ X] (2)

merge (x:xs) (y:ys) | x < y = x : merge xs (y:ys)

The 1500th number 860934420 is given by ns !! 1500. In the same computer, it

2*min=4 3*min=6 5*min=10 3*min=9 5*min=15

(a) Enqueue 4, 6, 10; (b) Enqueue 9, 15;

(c) Enqueue 8, 12, 20; (d) Enqueue 25.

• A list is either empty, denoted as ∅ or NIL;

Figure 1.1: A list of nodes

to Turing machine[93], [99].

like the list in Lisp for example.

1.3 Basic operations

In order to get the i-th element from a none empty list:

• if i is 0, the result is the first element;

• Otherwise, the result is the (i − 1)-th element in the sub-list.

• Otherwise, the result is the last element of the sub-list X ′ .

• If X is a singleton [x1 ], the result is empty [ ];

• Otherwise, we recursively get the initial sub-list for X ′ , then prepend x1 to it as

1.3.3 Reverse index

lastAt(i, L) = getAt(|L| − i − 1, L) (1.6)

x[1] x[2] ... x[i+1] ... x[n] .

(a) p2 starts from the head, behind p1 in i steps.

x[1] x[2] ... x[n-i] ... x[n] .

Figure 1.2: Sliding window formed by two pointers

lastAt(i, X) = slide(X, drop(i, X)) (1.7)

where function slide(X, Y ) drops the heads for both lists:

slide(x : xs, [y]) = x

• If append x to the empty list, the result is [x];

The corresponding iterative implementation is as the following:

• If i = 0, it means we are changing the first element, the result is x : L′ ;

This algorithm is bound to O(i) time, where i is the position to update.

• If i = 0, it then turns to be a ‘cons’ operation: x : L;

• Otherwise, we recursively insert x to L′ at position i − 1; then prepend the original

• If either L is empty or x is not greater than the first element in L, we prepend x to

• Otherwise, we recursively insert x to the sub-list L′ .

7: Rest(L) ← Cons(x, Rest(L))

12=2 13=3 15=5 22=4 23=6 25=10 32=6 33=9 3*5=15

42=8 43=12 4*5=20

2min=4 3min=6 5min=10 3min=9 5*min=15