Blaeser Algorithms and Data Structures
Blaeser Algorithms and Data Structures
Algorithms and
Data Structures
Markus Bläser
Saarland University
Draft—February 3, 2011 and forever
2
3
4 1. Introduction
Algorithm 1 Binary-search
( array a[1..n], a[1] < a[2] < · · · < a[n], element x
Input: Sorted
m if there is an 1 ≤ m ≤ n with a[m] = x
Output:
−1 otherwise
1: ` := 0; r := n + 1;
2: while ` + 1 < r do /* 0 ≤ ` < r ≤ n + 1 and a[`] < x < a[r] */
3: m := b `+r
2 c;
4: if a[m] = x then
5: return m;
6: if a[m] < x then
7: ` := m
8: else
9: r := m;
10: return −1;
If we start with this, then before the first iteration of the while loop, our
invariant is certainly true. Now we show that if the invariant is true before
the current iteration, then it will also be true after the iteration (that is,
before the next iteration). Note that ` ≤ m ≤ r and in lines 7 and 8, either
` or r is set to m. Thus 0 ≤ ` ≤ r ≤ n + 1 holds after the iteration. If ` = r,
then this would mean that ` + 1 = r before the iteration. But in this case,
the loop would not have been executed. Thus the first part of the invariant
is true. For the second part, note that it is certainly true in the beginning,
since by our convention, a[0] = −∞ and a[n + 1] = +∞.1 Now assume that
the invariant was true before the current iteration. If a[m] = x, then we stop
and there is no next iteration. If a[m] 6= x, then in lines 6 to 9, ` and r are
chosen in such a way that the invariant holds at the end of the iteration.
If the while loop is left, then either a[m] = x or ` + 1 = r. In the first
case, the algorithm returns m. In the second case, the algorithm returns −1.
But we know that a[`] < x < a[r] from our invariant. But since there are no
further array elements between a[`] and a[r], x is not in the array. Thus the
algorithm is correct.
The algorithm also terminates: In each iteration either ` is increased or
r is reduced. Hence, the algorithm will terminate.
Theorem 1.1 Binary-search always terminates and returns the correct out-
put.
versa, the recursive one is often easier to analyse, but might be slower in
pratice.
operations like +, −, ·, /, and rem, where the latter two are integer division
and remainder. := denotes the assignment operator. Pseudocode is very
intuitive. For instance, you should be able to transfer the pseudocode of
binary search into real code of the programming language of your choice
without any problems.
When we want to analyze the running time of an algorithm written in
pseudocode, we have to charge costs for different operations. Since we want
to abstract from different programming languages and architectures, there
is no point in discussing whether an addition is more costly than a multipli-
cation, so all statements in pseudocode take one unit of time.2
There is one exception: We have to bound the values that we can store
in integer variables. Otherwise we could create huge numbers by repeated
squaring: We start with 2, multiply it with itself, we get 4, multiply it
with itself and get 16 and so on. With n multiplications, we can create the
n
number of 22 , which has an exponential number of bits. We could even have
efficient algorithms for factoring integers, a problems that is supposed to be
hard; modern cryptographic systems are based on this assumption. We can
even have efficient algorithms for so-called NP-hard problems, which you will
encounter soon in the lecture “Grundzüge der Theoretischen Informatik”, see
the first assignement. (And even this is not the end of the story. . . ) So
we will require that our integer variables can only store values that are
polynomially large in the size of the data that we want to process. In the
binary search algorithm, n is the size of the input (or n + 1 if you also want
to count x.) Our integers can hold values up to some polynomial in n (of
our choice), say, n2 . This is enough to address all the elements of the array,
so this is a reasonable model. If we want to have larger integers, we have
to simulate them by an array of small integers (like bignum packages do
this in reality). In our binary search example, the values of the variables `,
m, and r are always bounded by n + 1. The values of the a[i] and x are
part of the input. If they were large, then n would not be an appropriate
measure for the size of the input but rather the sum of the bit lengths of the
array elements and x. But the array elements and x play a special role in
binary search, since we only access them via comparisons. So there actual
size does not matter at all, we could even assume that they are real numbers
or something else, as long as there is a total order on them.
The underlying architecture is the so-called Word RAM, RAM stands
for random access machine, the prefix Word indicates that we can store
numbers that are polynomially large in the input size in the cells. This is
a mathematical abstraction of the von Neumann architecture, a pioneering
computer architecture that still influences modern architectures. But since
2
This oversimplification turned out to be very fruitful, since we now can concentrate
one finding better algorithms and not trying to avoid multiplications because they are
so costly. Nevertheless, if we want to do an implementation in a concrete programming
language of an algorithm given in pseudocode, you have to think about such details.
Proof. We only prove the first statement, the second is proven in the
same way. If f ∈ O(g), then there is a constant c and an n0 such that
f (n) ≤ cg(n) for all n ≥ n0 . But this means that g(n) ≥ 1c f (n) for all
n ≥ n0 . Hence g ∈ Ω(f ). This argument is reversible, which shows the “vice
versa”-part (replace c by 1c ).
Exercise 1.1 Prove that Θ(f ) = {g : N → R≥0 | there are c, c0 > 0 and n0 ∈ N
such that c0 f (n) ≤ g(n) ≤ cf (n) for all n ≥ n0 }.
Syntactic sugar 1.6 Although O(f ) and all the other classes are sets of
functions, we will often treat them like a function. “Algorithm A has running
time O(n2 )” means that the running time of A is a function in O(n2 ). Many
books also write g = O(f ) instead of g ∈ O(f ) and O(g) = O(f ) instead of
O(g) ⊆ O(f ). I will try to avoid this.
Lemma 1.7 Let λ and ρ be the value of ` and r before an iteration of the
loop, and λ0 and ρ0 after the iteration.4 Then
ρ0 − λ0 − 1 ≤ b(ρ − λ − 1)/2c.
ρ0 − λ0 − 1 ≤ max{ρ − µ − 1, µ − λ − 1}
1
≤ max{ρ − ((ρ + λ)/2 − ) − 1, (ρ + λ)/2 − λ − 1}
2
= max{(ρ − λ − 1)/2, (ρ − λ)/2 − 1}
= (ρ − λ − 1)/2.
Since ρ0 + λ0 − 1 is a natural number, we can even round down the right hand
side.
1. d n2 e = b n+1
2 c,
n 1
2. 2 − 2 ≤ b n2 c ≤ n2 .
Remark 1.11 The same is true (with the same proof ), if the values of
f (0), f (1), . . . , f (i) are given. In this case gλ (m) ≤ m − 1 has to hold only
for m > i.
Remark 1.12 Again with the same proof, one can show that if fˆ satisfies
the inequalities fˆ(n) ≤ fˆ(g1 (n))+· · ·+ fˆ(g` (n))+h(n) if n > 0 and fˆ(0) ≤ c,
then fˆ(n) ≤ f (n) for all n where f is the unique solution of the equation.
Lemma 1.13 The function log is the unique solution of the equation
(
f (b n2 c) + 1 if n > 0
f (n) =
0 if n = 0
Proof. By Lemma 1.10, the equation has a unique solution, since b n2 c < n
for n > 0. The function log is monotonically increasing. It is constant on
the sets {2ν , . . . , 2ν+1 − 1} for all ν and log(2ν+1 ) = log(2ν+1 − 1) + 1.
We claim that if n ∈ {2ν , . . . , 2ν+1 − 1}, then b n2 c ∈ {2ν−1 , . . . , 2ν − 1}.
Since n → b n2 c is a monotonically increasing function, it is sufficient to check
this for the borders:
Therefore,
log(bn/2c) + 1 = log(n)
for all n > 0.
Corollary 1.14 Let c ∈ R≥0 . The function c · log n is the unique solution
of (
f (b n2 c) + c if n > 0
f (n) =
0 if n = 0
1
Proof. By dividing the equations by c, we see that c · f = log.
Now we look at the recursive version of binary search. Let t : N → N
denote an upper bound for the number of comparisons5 made by binary
search. t fulfills
t(0) = 0
t(n) ≤ t(bn/2c) + 1.
The second equation follows from Lemma 1.7. Thus by Lemma 1.13, t(n) ≤
log(n) for all n.
Disclaimer
Algorithms and data structures are a wide field. There are way
more interesting and fascinating algorithms and data structures than
would ever fit into this course. There are even way more important
and basic algorithms and data structures than would fit into this
course. Therefore, I recommend that you study related material in
text books to get a broader understanding. The books by Cormen,
Leiserson, Rivest, and Stein and by Mehlhorn and Sanders are
excellent, but there are many more.
5
For the ease of presentation, we count the “a[m] = x” and “a[m] < x” comparisons
as one comparisons. Think of it as having a three-valued comparisons that returns the
results “smaller”, “equal”, or “larger”.
2.1 Selection-sort
To work properly, binary search requires that the array is sorted. How fast
can we sort an array? A simple algorithm works as follows: Look for the
minimum element of the array, put it at the front. Then sort the rest of
the array in the same way. Look at the minimum element of the remaining
array and put it at the front, and so on. In this way, after m iterations, the
first m elements are sorted and contain the m smallest elements of the array.
Algorithm 3 shows one implementation of this. It is written recursively, but
one can easily replace the recursion by one for loop (does not look so nice,
but usually yields a faster code).
The correctness of the algorithm is easily shown by induction.
Proof. By induction on n.
Induction base: If n ≤ 1, then there is nothing to sort and selection sort
does nothing.
Induction step: By the induction hypothesis, envoking Selection_sort(a[2..n])
sorts the subarray a[2..n]. Since a[1] is the smallest among the elements in
the array, the complete array a[1..n] is sorted, too.
Algorithm 3 Selection-sort
Input: array a[1..n]
Output: afterwards, a[1] ≤ a[2] ≤ · · · ≤ a[n] holds
1: if n > 1 then
2: i := Select-minimum(a[1..n])
3: h := a[1]; a[1] := a[i]; a[i] := h;
4: Selection-sort(a[2..n]);
13
14 2. Sorting
the proof formally. Later, we will often just state the invariant and give a
hint how to prove it, assuming that you can do the proof on your own.
Algorithm 4 Select-minimum
Input: array a[1..n]
Output: index i such that a[i] = min1≤j≤n a[j]
1: i = 1
2: for j = 2, . . . , n do /* i ≤ j − 1 ∧ a[i] is the min of a[1..j − 1] */
3: if a[j] < a[i] then
4: i := j
5: return i;
We can determine c(n) by the so-called iteration method, an “on foot” method,
which accidently turns out to be elegant here:
c(n) = n − 1 + c(n − 1)
= n − 1 + n − 2 + c(n − 2)
= n − 1 + n − 2 + n − 3 + · · · + 1 + c(1)
n−1
X
= i
i=1
1
= n(n − 1) = O(n2 ).
2
Thus the overall running time of selection sort is O(n2 ).
Algorithm 5 Merge
Input: array a[1..n], integer t such that a[1..t] and a[t + 1..n] are sorted
Output: afterwards, a[1..n] is sorted
i := 1; j := t + 1
for k = 1, . . . , n do
/* b[1..k − 1] is sorted and all elements in b[1..k − 1] are
smaller than the elements in a[i..t] and a[j..n] */
if j = n + 1 or i ≤ t and a[i] < a[j] then
b[k] := a[i];
i := i + 1;
else
b[k] := a[j];
j := j + 1
a[1..n] := b[1..n];
Algorithm 6 Merge-sort
Input: array a[1..n]
Output: afterwards, a[1] ≤ a[2] ≤ · · · ≤ a[n] holds
if n > 1 then
m := bn/2c;
Merge-sort(a[1..m]);
Merge-sort(a[m + 1..n]);
Merge(a[1..n], m);
What can we say about the running time? Let us estimate the number
of comparisons c(n) that merge sort does on arrays of length n in the worst
case. We have
(
0 if n ≤ 1
c(n) ≤ n n
c(b 2 c) + c(d 2 e) + n − 1 if n > 1
since we split the array into two halves and need an additional n − 1 com-
parisons for merging. Since c is certainly monotonically increasing, we can
replace the second inequality by
c(n) ≤ 2c(d n2 e) + n − 1.
c(n)
Let us divide the inequality by n − 1 and set ĉ(n) = n−1 for n ≥ 1. We get
c(n) c(d n2 e)
ĉ(n) = ≤2· +1
n−1 n−1
c(d n2 e)
≤2· +1
2(d n2 e − 1)
n
= ĉ(d e) + 1.
2
We “almost” know ĉ, since it “almost” fulfills the same recursive inequality
as log. In particular, ĉ(n) ≤ log(n − 1) for n > 0, because the function
n 7→ log(n − 1) fulfills the same recurrence:
log(n − 1) = log(b(n − 1)/2c) + 1 = log(dn/2e − 1) + 1 = log(dn/2e − 1) + 1.
Therefore, merge sorts makes c(n) = (n − 1)ĉ(n) = (n − 1) log(n − 1) com-
parsions in the worst case and the overall running is O(n log n).
3.1.1 Heaps
A (binary) heap A can be viewed as an ordered binary tree all levels of which
are completely filled except for the bottom one. Each node of the tree stores
one element of our array. We have three basic functions
So far, there is nothing special about heaps. The crucial things is the
they satisfy the heap property:
Heap property
A[Parent(i)] ≥ A[i] for all i except the root
18
3.1. Heap sort 19
1
26
2 3
23 16
4 5 6 7
19 16 11 12
8 9 10
15 13 10
Figure 3.1: A heap. The large numbers in the circles are the elements stored
in the heap. The small numbers next to the circle is the corresponding
position in the array.
That is, the value of every node is larger than the values of its two children.
Using induction, we get that for every node i in the heap, all nodes that are
below A[i] have a value that is at most A[i]. Warning! Do not confuse
our heaps with the heap in a JAVA environment.
Heaps can be efficiently implemented: Assume we want to store n ele-
ments in an array A. We just store them into in an array A[1..n]. We have
a variable heap-size which stores the number of elements in the heap.1 The
nodes are just the indices {1, . . . , heap-size}. The array may contain more
elements (with larger index), since later on, we will build a heap step by step
starting with a heap of size 1.
The value of the root is stored in A[1]. It has two children, we store their
value in A[2..3]. These two children have altogether four children. We can
store their values in A[4..7]. In general, the ith “layer” of the tree is stored in
A[2i ..2i+1 − 1]. (The root is the 0th layer.) The last layer might be shorter
and is stored in A[2h−1 ..heap-size]. Here h = log(heapsize) − 1 is the height
of the tree, the number of edges on a longest path that goes from the root
to a leaf. (Because of the structure of heaps, every path from the root to a
leaf has length either h − 1 or h.)
The methods Parent, Left, and Right can be easily implemented as fol-
lows:
It is easy to verify that the functions Left and Right map the set {2i , . . . , 2i+1 −
1} (level i) into the set {2i+1 , . . . , 2i+2 − 1} (level i + 1). Left is nothing else
1
Yes, I know, never ever have a global variable floating around somewhere. But for
pseudocode, this is okay. We just want to understand how heaps work. I am pretty sure
that with your knowledge from the “Programmierung 1+2” lectures, you can do a very
nice state-of-the art implementation.
Algorithm 7 Parent(i)
1: return bi/2c
Algorithm 8 Left(i)
1: return 2i
but a left shift and adding the least significant bit 0, Right does the same but
adds a 1. In particular these implementations are very efficient and short,
so in reality, you could declare them as inline. Since we use the Left, Right,
and Parent procedures only inside other procedure, we do not check whether
the result is still a valid node, i.e, whether it is in {1, . . . , heap-size}; this has
to be done by the calling procedures.
Note that we check in lines 4 and 6 whether the children are indeed present.
If h = i, then the heap property is also satisfied by the subtree with root i
and we do not have to do anything. Otherwise, we exchange A[i] with A[h].
Now A[i] contains the largest element of the subtree with root i. The subtree
whose root is not h is not affected at all and still satisfies the heap property.
The other subtree with root h almost fulfills the heap property: The two
subtrees Left(h) and Right(h) fulfill the heap property but A[h] might not
be the largest element. But this is precisely the input situation of Heapify,
so we can apply recursion.
What’s the running time of Heapify? Since every call results in at most
one recursive call, the number of recursive calls is bounded by the height of
the subtree with root i. In each call, only a constant number of operations
are executed. Thus, the total running time is O(log n).
Algorithm 9 Right(i)
1: return 2i + 1
Algorithm 10 Heapify
Input: heap A, index i such that the heap property is fulfilled for the sub-
heaps with roots Left(i) and Right(i)
Output: afterwards, the heap property is also fulfilled at i
1: ` := Left[i]
2: r := Right[i]
3: h := i
4: if ` ≤ heap-size and A[`] > A[i] then
5: h=`
6: if r ≤ heap-size and A[r] > A[h] then
7: h=r
8: if h 6= i then
9: swap(A[i], A[h])
10: Heapify(A, h)
Algorithm 11 Build-heap
Input: array A[1..n]
Output: afterwards, A satisfies the heap property
1: heap-size := n
2: for i = bn/2c, . . . , 1 do
3: Heapify(A, i)
Exercise 3.2 Show that in general, a heap with n nodes has at most dn/2h+1 e
subtrees of height h.
Compared to merge sort, instead of just cutting the array into two halves,
sort recursively, and then merge, quick sort divides the array into two parts
in a more clever way such that no merging is needed after the recursive calls.
The function Partition does the following. It chooses a pivot element p
, that is, any element of A[`..r]. We simply choose p := A[`].2 Then we
rearrange A in such a way that A[1..q − 1] only contains elements ≤ p, A[q]
contains p, and A[q + 1..r] contains only elements ≥ p. Finally, we return q.
We now simply need to sort A[1..q − 1] and A[q + 1..r] recursively and are
done.
The function Partition can easily implement in time O(n). With a little
care, we can even do it inplace.
In the first line, the pivot element is chosen. We take the first element.
Then we move the pivot temporarily to the end. Then we run through the
array and compare whether A[i] ≤ p. If this is the case, then A[i] has to be
moved to the left. So we swap it with A[s]. The index s stores the current
border between the “lower” part and the “upper” part. Finally, we just put
the pivot again in the middle.
2
Choosing a pivot element in practice is somewhere between art and magic, since it
shall be chosen in such a way that the two parts are roughly balanced. Taking the first
element is often a bad choice in pratice, in particular, if the array is partly sorted.
Algorithm 14 Partition
Input: array A[1..n], incides ` ≤ r
Output: afterwards, there is an index q such that all elements in A[`..q − 1]
are ≤ A[q] and the ones in A[q + 1..r] are ≥ A[q]. q is returned.
1: p := A[`]
2: swap(A[`], A[r])
3: s := `
4: for i := `, . . . , r − 1 do
5: if A[i] ≤ p then
6: swap(A[i], A[s])
7: s := s + 1
8: swap(A[s], A[r])
9: return s;
Exercise 4.1 1. Show that the usual relation < on N is a total order.
25
26 4. Selection problems
in one or two sentences, the left child of a node v is denoted by v< and the
right child is denoted by v> . With every node v of the graph, we associate
a relation R(v), which is defined inductively. If v is the root of the tree,
then R(v) = ∅, that is, we do not know anything about the relation between
the elements in X. Assume that we have defined R(v) for some node v and
that v is labeled with the comparison x?y. Then R(v< ) is the transitive
closure of R(v) ∪ {(x, y)}. In the same way, R(v> ) is the transitive closure of
R(v)∪{(y, x)}. So R(v) stores all the information about the relation between
the variables in X that the comparison tree has gathered so far (including
what it can deduce by transitivity). We always assume that the comparison
tree does nothing stupid, that is, it will never compare an x with itself nor
does it compare an x with a y with (x, y) ∈ R(v) or (y, x) ∈ R(v).
Let π be a permutation in Sn . For a relation R on x, the relation Rπ is
defined by
(xi , xj ) ∈ Rπ : ⇐⇒ (xπ(i) , xπ(j) ) ∈ R
for all i, j, that is, the relation is the same up to changing the roles of the
x1 , . . . , x n .
What does it mean that a comparison tree computes a particular relation
R on X? We have our variables X = {x1 , . . . , xn }. There is a hidden total
order, that is, there is a permutation π ∈ X such that xπ(1) < xπ(2) < · · · <
xπ(n) but the comparison tree does not know this permutation. Now we
“run” the comparison tree in the obvious way. At every node v, two elements
xi , xj ∈ X are compared. If xi < xj according to the order given by π, we
go to v< , otherwise we go to v> . Eventually, we get to a leaf `. We require
that R ⊆ R(`)π , that is, the comparisons tree finds the relation R between
the elements up the the renaming given by π.
In the case of sorting, the relation computed at every leaf is simply a total
order on X. That is, the relation computed by the tree is S = {(xi , xj ) | i <
j}.
In the case of minimum computation, the comparison tree shall be able
to tell at every leaf which one of the xj is the minimum. That is, the relation
computed by the tree is M = {(x1 , xi ) | 2 ≤ i ≤ n}. This means that x1 is
minimal and no other element is.
Every algorithm that we studied so far was comparison-based, that is,
we only accessed the elements of the array by comparisons. Therefore, ev-
ery algorithm yields a comparison tree by just running the algorithm and
whenever two elements are compared, we add a node to the comparison tree
and branch on the two possible outcomes. (If the algorithms does something
stupid and compares two elements where the outcome is already known, then
we do not add a node and there is no need to branch.) The number of com-
parisons that our algorithm does in the worst case is nothing else than the
height of the constructed comparisons tree.
x1 ?x2
x1 ?x3 x2 ?x3
{(x1 , x2 ), (x1 , x3 )} {(x1 , x2 ), (x3 , x1 ), (x3 , x2 )} {(x2 , x1 ), (x2 , x3 )} {(x2 , x1 ), (x3 , x2 ), (x3 , x1 )}
Figure 4.1: A comparison tree which finds the minimum of three elements.
The left child of a node v is the child v< , right child is the node v> . (Node
names are not drawn.) Each node that is not a leaf is labeled with a compar-
ison. Below each leaf, there is the relation that is computed at the leaf. At
two leaves, the tree learns the complete order between the three elements,
at the other two, it only knows that one is smaller than the other two, but
it does not know the relation between the other two. But this is fine, since
the tree shall only find the minimum.
Comparison trees itself are not a good machine model for sorting, since
they are non-uniform: For most comparison trees, there is no compact way
to represent them! But we will use comparison trees only to show lower
bounds. And since every sorting algorithm in pseudocode gives a comparison
tree, this also yields lower bounds for the number of comparisons made by
algorithms in pseudocode.
4.1.1 Sorting
Theorem 4.1 Every comparison that sorts the elements in X = {x1 , . . . , xn }
has at least n! leaves.
1 log2 e
log(n!) ≥ n log2 n − n log2 e + log2 (2πn) + ≥ n log n − n(log2 e + 1).
2 12n + 1
Merge sort uses (n − 1) log(n − 1) ≤ n log n comparisons. So up to a linear
lower order term, merge sort uses the minimum number of comparisons.
Let us first make the proof a little more complicated. In this way, the
method becomes applicable to a wider range of problems. Consider a com-
parison tree T with node set V . A function w : V → R is a potential function
if w(v) ≤ w(v< ) + w(v> ) for all nodes v that are not a leaf.
Proof. We prove the following more general statement: For every node
v, the height of the subtree with root v is ≥ log2 ( w(v)
B ). The proof is by
induction on the height of the subtree.
Induction base: If v is a leaf, then w(v) ≤ B, hence w(v)/B ≤ 1 and
log2 ( w(r)
B ) ≤ 0.
Induction step: Let v be any node that is not a leaf. Since w(v) ≤ w(v< ) +
w(v> ), we have w(v) 2 ≤ max{w(v< ), w(v> )}. W.l.o.g. let w(v< ) ≥ w(v> ).
By the induction hypothesis, we know that the height of the subtree with
root v< is at least log( w(v <)
B ). The height of the subtree with root v therefore
is at least
w(v< ) w(v< ) w(v)
log2 + 1 = log2 2 · ≥ log2 .
B B B
Assume that at a node v, R(v) has i minima. Then both R(v< ) and R(v> )
have at least i−1 minima, since a comparison can kill at most one minimum.
Because 2i = 2i−1 + 2i−1 , m is a potential function. At the root r, we have
m(r) = n, since every element is a minimum. At a leaf v, m(v) = 1, since
n
our tree finds the minimum. Therefore, the height is ≥ log2 ( 22 ) = n − 1.
Theorem 4.3 Every comparison tree that finds the minimum has height
≥ n − 1.
Algorithm 15 Select
Input: array a[1..n], integer i
Output: the ith largest element in a
1: If n ≤ 60, then find the ith largest element by sorting.
2: Divide the n elements in bn/5c groups of 5 elements. At most 4 elements
remain.
3: Find the median of each of the bn/5c groups by sorting.
4: Recursively call Select to find the median m of the bn/5c medians.
(Once we found m, we forget about the groups.)
5: Use the procedure Partition on a with m as the pivot element.
6: Let q be the position of m in the array.
7: If q = i, then return m.
8: If i < q, then call Select(a[1..q − 1], i). Else call Select(a[q + 1..n], i − q).
The algorithm Select first groups the elements into bn/5c groups of five
elements. From each of the groups it computes the median for instance just
by sorting. (With Mergesort this takes 12 comparisons; there is also an
(optimal) comparison tree for selecting the median with height 6). We use
Select recursively to find the median m of the bn/5c medians and use m as
a pivot. m is a good pivot, because it yields a rather balanced partition.
3 3
Lemma 4.4 q − 1 ≥ 10 n − 3 and n − q ≥ 10 n − 3.
By selecting the median of medians, we found a good pivot for the Par-
3
tition procedure. Since the size of each partition is at least 10 n − 3, it can
7
be, on the other hand, at most 10 n + 2 (we can subtract one more for the
median of medians).
Let t(n) be the number of comparisons made by select on arrays of length
n. We have
j n k
7 jnk
t(n) = t +t n + 2 + 12 + n, (4.1)
5 10 5
if n > 60, since we have one recursive call with an array of length b n5 c for
finding the median of medians and another recursive call on the partition
that contains the true median. We need 12 · n5 comparisons to find the
d
then t(n) ≤ c · n where c = max{ 1−( ` , e}.
1 +···+` + N )
7
We can use Lemma 4.5 to solve (4.1). We can bound 10 n + 2 from above
11 1 11 2 29
by 15 n for n > 60. Since 5 + 15 + 60 = 30 < 1, we get that t(n) ≤ c · n with
c = 132.
Remark 4.6 The actual value for c is rather coarse. We can get the term
`
N as small as we want by making N larger. In the same way, we can bound
7 7
10 n + 2 by h · n for an h that is arbitrarily close to 10 .
5.1 Stacks
Stacks provide two operations. We can insert an element and we can remove
it. These two operations are usually called Push and Pop. Stacks use the
last-in-first-out principle. They work like stacks in real life: When you want
to put something on the stack, you can only put it on the top. And when
you want to remove something from the stack, you can only take the element
on the top.
To implement a stack, all you need is an array S[1..N ] and a variable
top. This variable stores the index of the last element put on the stack. In
the beginning, we set top to 0.
The Algorithm IsEmpty checks, whether there are elements on the stack.
Algorithm 16 IsEmpty
Input: stack S
Output: 1 if S is empty, 0 otherwise
1: if top = 0 then
2: return 1
3: else
4: return 0
Push pushes an element x on the stack. It first increases top by one and
then stores x in the new position. It also checks whether the stack is full. (If
for some reason, we know that there will be no stack overflow, one can also
dismiss this check for efficiency reasons.)
Pop removes on element from the stack and returns it. It also checks
whether there is an element to pop.
All the operations supported by the stack take O(1) time. Often, a Peek
procedure is provided in addition. Peek returns the top element without
removing it. Peek can be simulated by popping the top element and pushing
it back.
32
5.2. Queues 33
Algorithm 17 Push
Input: stack S, element x
Output: adds x to S
1: if top ≥ N then
2: “error”
3: else
4: top = top + 1;
5: S[top] = x;
Algorithm 18 Pop
Input: stack S
Output: returns and removes the top element of S
1: if IsEmpty(S) then
2: “error”
3: else
4: top := top − 1
5: return S[top+1 ]
5.2 Queues
Queues do the same as stacks: We can store elements and retrieve them.
However, the storing princicple is first-in-first-out. Queues work like queues
in real life, say, in the mensa. If a student arrives, he or she goes to the end
of the queue. Whenever a meal is handed out, the first student in the queue
leaves happily (?) with his or her meal.
To implement a queue, we need an array Q[1..N ]. We have two variables,
head and tail . The variable head stores the index of the elements that is the
first in the queue, tail the index of position after the queue where the next
arriving element will be put. In the beginning, head and tail will be set to
1.
The queue will be arranged in a circular way in Q. If head ≤ tail , then
the queue consists of the elements Q[head ..tail − 1]. If tail < head , then the
queue are the elements Q[head ..N ] together with the elements Q[1..tail − 1].
In this way, we do not have to shift the queues by one whenener an element
is dequeued. (Unlike queues in real life, which usually move forward when
the first element is removed. One exception might be very bad meals.)
The queue is empty, when head = tail . The queue is full, when head =
tail + 1. In the latter case, there is still one free cell, but if we occupied
this one, then head = tail , which we could not distinguish from the empty
queue. (This is an information theoretic problem: there are n + 1 possible
numbers of elements that can be stored in the queue, namely 0, 1, . . . , n.
On the other hand, the difference between head and tail can only attain n
different values.) If we want to uses this last cell, then we should introduce
a variable count which counts the length of the queue. However, the first
solution usually gives a slightly more efficient code.
Procedure IsEmpty checks whether the queue is empty.
Algorithm 19 IsEmpty
Input: queue Q
Output: 1 if Q is empty, 0 otherwise
1: if head = tail then
2: return 1
3: else
4: return 0
Algorithm 20 Enqueue
Input: queue Q, element x
Output: adds x to Q
1: if head = tail + 1 or head = 1 and tail = N then
2: “error”
3: else
4: Q[tail ] = x
5: if tail = N then
6: tail = 1
7: else
8: tail = tail + 1
Procedure Dequeue returns the first element and works in a similar man-
ner like Enqueue. All three procedures have running time O(1).
Algorithm 21 Dequeue
Input: queue Q
Output: the first element of Q
1: if head = tail then
2: “error”
3: else
4: x := Q[head ]
5: if head = N then
6: head := 1
7: else
8: head := head + 1
9: return x
Usually, there is more data to store than the key. So we also store a reference
to the element data. But this is not important for our considerations. For the
first element h in the list, the head, Prev(h) = NULL is a null reference. For
the last element in the list, the tail, Next(x) = NULL. There is a reference
head which points to the first element of the list.
The procedure List-search finds an element in the list with a given key,
if there is such an element, otherwise it returns the NULL reference. In the
worst case, it has to scan the whole list, therefore, the running time is O(n).
Note that the while loop either stops when we found the key k or we are
sure that there is no node with key k.
Algorithm 22 List-search
Input: a list L, a key k
Output: (a reference to) an element x with Key(x) = k, if there exists one,
NULL otherwise
1: x := head
2: while x 6= NULL and Key[x] 6= k do
3: x := Next(x)
4: return x.
The procedure List-insert adds an element to the front of the list. It takes
O(1) time. If we want to insert the element at a specific place, say after an
element y (maybe we want to keep the list sorted), the same algorithm works,
we just have to replace the reference head by Next(y).
Finally, the procedure List-delete deletes an element given by a reference
from a list. Its running time is O(1). If, however, the element is given by a
key, then we first have to use List-search to find the element.
Algorithm 23 List-insert
Input: a list L, an element x
Output: appends x to the front of the list
1: Next(x) := head
2: if head 6= NULL then
3: Prev(head ) := x
4: head := x
5: Prev(x) := NULL
Algorithm 24 List-delete
Input: a list L, an element x
Output: removes x from the list
1: if Prev(x) 6= NULL then
2: Next(Prev(x)) := Next(x)
3: else
4: head := Next(x)
5: if Next(x) 6= NULL then
6: Prev(Next(x)) := Prev(x)
Stacks
Queues
In many applications, like data bases, the data is dynamic: Elements are
inserted or deleted and we try to maintain the data in such a way that other
operations can be supported efficiently. Linked lists for instance support
such operations but the search is not efficient in lists.
Binary search trees are a simple data structure that support many dy-
namic set operations quite efficiently, in particular
Search: Finds an element with a particular key value (or detects that
there is no such element).
Binary search trees are rooted and ordered binary trees that fulfill the
binary search tree property (see below). The running time of the methods
above is bounded by the height of the trees. The height of binary search
trees “behaves like” the running time of quick sort. In the good cases and
even on the average, the height is logarithmic. In the worst case, however,
it is linear. There are variations of binary search trees that are guarantued
to have height that is bounded by O(log n), n being the number of nodes in
the tree (= number of elements stored). We will study one of them in the
next chapter.
We represent trees as a linked data structure. There are four basic meth-
ods:
38
39
Left(v) returns the left child of v (and NULL, if v has no left child).
Algorithm 25 Inorder-walk
Input: A node x of a binary search tree T
Output: prints the elements of T (x) in increasing key order
1: if x 6= NULL then
2: Inorder-walk(Left(x))
3: output x;
4: Inorder-walk(Right(x))
Exercise 6.2 1. Can you reconstruct the binary search tree from the out-
put of Inorder-walk?
2. A preorder walk first outputs the root and then deals with the left and
right subtrees, i.e., lines 2 and 3 of the procedure Inorder-walk are
exchanged. Can you reconstruct the binary search tree from the output
of a preorder walk?
26
19 29
11 23 27 33
13 21 35
20 22
6.1 Searching
Algorithm 26 BST-search
Input: node x, key k
Output: a node y with Key(y) = k if such a y exists, NULL otherwise
1: if x = NULL or k = Key[x] then
2: return x
3: if k < Key(y) then
4: return BST-search(Left(x), k)
5: else
6: return BST-search(Right(x), k)
Lines 1 and 2 deal with case when we either found our node or a node
with key k is not in the tree. In the latter case, x = NULL, i.e, we were in
a node v without any left or right child and called BST-search with Left(v)
or Right(v), respectively, which is NULL. The running time of BST-search
is bounded by O(h) where h is the height of T .
Algorithm 27 BST-minimum
Input: a node x in a binary search tree T
Output: the node in T (x) with minimum key
1: if Left(x) 6= NULL then
2: return BST-minimum(Left(x))
3: return x
Algorithm 28 BST-successor
Input: node x in a binary search tree T
Output: the successor of x, if x has a one, NULL otherwise
1: if Right(x) 6= NULL then
2: return BST-minimum(Right(x))
3: y := Parent(x)
4: while y 6= NULL and x 6= Left(y) do
5: x := y
6: y := Parent(y)
7: return y
In the second case, y is the lowest ancestor the left child of which is also an
ancestor. The running time is again linear in the height of T .
Exercise 6.4 Write the corresponding procedure for finding the predecessor.
Algorithm 29 BST-insert
Input: node x of a binary search tree T , new node z
Output: inserts z into T (x) mainting the binary search tree property
1: if x = NULL then
2: z becomes the root of T
3: else
4: if Key(z) < Key(x) then
5: if Left(x) 6= NULL then
6: BST-insert(Left(x), z)
7: else
8: Parent(z) := x
9: Left(x) := z
10: else
11: if Right(x) 6= NULL then
12: BST-insert(Right(x), z)
13: else
14: Parent(z) := x
15: Right(x) := z
x has one child: In this case, we can just remove x and connect the child
of x to the parent of x. (If x does not have a parent, i.e., is the root,
then the child of x becomes the new root.
x has two children: Then we first search the successor of x. The successor
y of x has at most one child. So we first delete y as in one of the first
two cases and then replace x by y.
Exercise 6.5 Prove that the insertion and deletion procedures keep the bi-
nary search tree property.
Algorithm 30 BST-delete
Input: node x in a binary search tree T
Output: deletes x from T
1: if Left(x) = NULL or Right(x) = NULL then
2: y := x
3: else
4: y = BST-successor(x)
5: if Left(y) 6= NULL then
6: v := Left(y)
7: else
8: v := Right(y)
9: if v 6= NULL then
10: Parent(v) := Parent(y)
11: if Parent(y) = NULL then
12: v becomes the root of T
13: else
14: if y = Left(Parent(y)) then
15: Left(Parent(y)) := v
16: else
17: Right(Parent(y)) := v
18: if y 6= x then
19: Key(x) := Key(y)
26
19 29
11 23 27 33
13 21 35
20 22
26
19 29
11 23 27 33
13 21 35
20 22
Figure 6.3: Deletion of a node with one child (with key 33). The node is
removed and the child becomes a child of the parent.
26
19 29
11 23 27 33
13 21 35
20 22
Figure 6.4: Deletion of a node with two childs (with key 19). We search the
succecessor (key 20). The successor is removed as in the first two cases and
its content is copied into the node the shall be deleted.
An AVL tree is a binary search tree with the extra property that for every
internal node x, the height of the subtree with root Left(x) and the height
of the subtree with root Right(x) differ by at most one.
For the static operations like searching, finding minimal elements, etc.
we can use the implementations for binary search trees. We will also use the
insert and delete procedure of binary search trees, but afterwards, we have
to ensure that the AVL property is restored.
AVL trees are named after their inventors, G. M. Adelson-Velsky and E.
M. Landis.
If we want implement AVL trees, we do not have to store the heights, it
is sufficient to store the balance. This information can be stored in two bits
which can often be squeezed somewhere in.
46
7.1. Bounds on the height 47
for all n (hint: induction on n). This is called the Moivre-Binet formula.
x y
y x
T1 T3
T2 T3 T1 T2
Figure 7.1: A left rotation around x transforms the tree on the left hand side
into the tree on the right hand side. A right rotation around y transforms
the tree on the right hand side into the tree on the left hand side.
7.2.1 Rotations
Rotations are used to restructure binary search trees. A left rotation around
x transforms the tree on the left hand side in Figure 7.1 into the tree on the
right hand side. A right rotation around y transforms the tree on the right
hand side in Figure 7.1 into the tree on the left hand side. In this sense, left
and right rotations are inverse operations.
If the tree on the left hand side fulfills the binary search tree property,
then the tree T1 contains all nodes the key of which is smaller than Key(x).
The tree T2 contains all node with a key between Key(x) and Key(y). And
T3 has all nodes with a key greater than Key(y). But then the tree on the
right hand side fulfills the binary search tree property, too. The same is true
for the right rotation.
Lemma 7.3 Left and right rotations preserve the binary search tree prop-
erty.
7.2.2 Insertion
When a node is inserted, it always becomes a leaf in the tree. But because
we added the virtual leaves, inserting a node means that a virtual leaf is
replaced by a virtual node with height 1 having two virtual leaves. So a tree
of height 0 is replaced by a tree of height 1. This means, that potentially
the AVL tree property is violated.
Observation 7.4 After inserting a node y, the AVL tree property can only
be violated at nodes on the path from y to the root.
Algorithm 31 restores the AVL property. y is the current node the height
of which is increased. If y is the left child of its parent x, then we have to
increase the balance of x by 1 because the height of the left subtree of x
increased. If y is the right child, then the balance of x is decreased by 1. If
the balance of x becomes 0 in this way, then we are done, because the height
of T (x) did not change.1
Algorithm 31 AVL-insert-repair
Input: AVL tree T , a node y inserted into T
Output: afterwards, the AVL property is restored
1: x := Parent(y)
2: while x 6= NULL do
3: if y = Left(x) then
4: Bal(x) := Bal(x) + 1
5: else
6: Bal(x) := Bal(x) − 1
7: if Bal(x) = 0 then
8: return
9: if Bal(x) = 2 or Bal(x) = −2 then
10: Restore the AVL property using a rotation or double rotation (see
Figure 7.1 and 7.2)
11: return
12: y := x; x := Parent(y)
If the balance of x became +1 or −1, then the height of T (x) was in-
creased by 1, so we need to inspect the parent of x in the next iteration of
the while loop. If the balance of x became +2 or −2, then the AVL tree
property is violated and we have to repair it. We only consider the case
when the balance of x is −2, the other case is completely symmetric. If the
balance of x became −2, then y = Right(x) and the balance of y is either
+1 or −1. We distinguish between these to cases:
1
The balance of x was +1 or −1 before the insertion, that is, the height of one of its
subtrees was one less than the height of the other. After the insertion, both of them have
the same height now. Since the node above x do not “see” this difference, we can stop.
After the rotation, all balances are in {+1, 0, −1}. Furthermore, since
Height(T (x)) before the insertion is the same as Height(T (y)) after the
insertion, we can stop, because there cannot be any further violations
above y.
Bal(y) = 1 after the insertion: In this case, we perform a left double rotation
as depicted in Figure 7.2. We set h := Height(T1 ). The height of the
other subtrees are then determined by the balances of x, y, and z.
Because we assumed that Bal(y) = 1 after the insertion, the insertion
took place in T (z):
After the double rotation, all balances are in {+1, 0, −1}. Furthermore,
the height of T (x) before the insertion is the same as the height of T (z)
after the insertion. Therefore, the procedure can stop.
In both cases, the balances are restored and moreover, we can leave the
procedure.
x z
y x y
T1
z
T4 T1 T2 T3 T4
T2 T3
Figure 7.2: A left double rotation. You can think of a double left rotation as
a right rotation around y followed by a left rotation around x. This explains
the name double rotation. A double rotation also preserve the binary search
tree property.
7.2.3 Deletion
Deletion is performed similar to insertion. We use the delete method from
binary search trees and then move the path up to the root and try to restore
the AVL tree property
When we delete a node in a binary search tree, three cases can occur.
Either it is a leaf in the binary tree, that means, it is an internal node with
two virtual leaves in our AVL tree. Or it is an internal node with only one
child. In an AVL tree, this means that one of its children is a virtual leaf.2
Or it is an node note with two children that are internal nodes, too. Then
we delete its successor and copy the content of the successor to the node.
Let v be the node deleted. We assume that there are no references from
nodes in the tree to v but the Parent(v) still references to the former parent
of v in T . After deletion, we call the method AVL-delete-repair with v. The
procedure works in a similar way to AVL-insert-repair by going up the path
from v to the root. However, some things have to be changed: First of all,
the balance of x has to be adapted in the opposite way in lines 4 and 6.
Then, we can stop if the balance of x becomes −1 or 1. This means that the
balance was 0 before and the height of one of the subtrees was decreased.
But since this does not change the total height of the tree, then we stop.
If the balance becomes 2 or −2, then things get more complicated. As-
sume that Bal(x) = −2, the other case is treated similarly. In this case, v is
a left child of x. Let y be the right child of x. We distinguish three cases:
2
Note that by the AVL tree property, the other child then is an internal node with two
virtual leaves.
Since the height of T (x) before the deletion is larger than the height
of T (y) after the rotation, we have to go on and move up the path to
the root.
Bal(y) = 0: Again, we do a left rotation. Let h := Height(T1 ).
Since the height of T (x) before the deletion is the same as the height
of T (y) after the rotation, we can stop in this case.
Bal(y) = +1: In this case, we perform a left double rotation. Now we look
at at Figure 7.2. Let h := Height(T1 ).
All balances are restored, but the since the height of T (x) before the
deletion is larger than the height of T (z) after the rotation, we have
to go up.
Algorithm 32 AVL-delete-repair
Input: AVL tree T , a node v deleted from T
Output: afterwards, the AVL property is restored
1: x := Parent(v)
2: while x 6= NULL do
3: if v = Left(x) then
4: Bal(x) := Bal(x) − 1
5: else
6: Bal(x) := Bal(x) + 1
7: if Bal(x) = 1 or Bal(x) = −1 then
8: return
9: if Bal(x) = 2 or Bal(x) = −2 then
10: Restore the AVL property using a rotation or double rotation
11: v := x; x := Parent(v)
Both repair procedures trace a path of length O(log n). Each rotation
takes time O(1), since we have to update a constant number of references.
Mergeable heaps are heaps that support a union operation. That is, given
heaps H and H 0 , we can merge them into one big heap that contains all
elements of H and H 0 (and, of course, satisfies the heap property). Take our
ordinary heaps from heap-sort. To implement the union operation, we can
just concatenate the two arrays and call the Heapify procedure. However,
this takes time O(n) which is too slow for large sets of data. In this chapter,
we will introduce binomial heaps which support all heap operations in time
O(log n) and, in addition, the union operation in time O(log n), too. In
particular, binomial heaps support the following operations:
2. The tree Bk consists of two copies of Bk−1 . The root of one copy
becomes the the leftmost child of the root of the other copy.
54
8.1. Binomial trees 55
Note that Bk only “describes a structure”, later we will fill the nodes with
data.
1. Bk has 2k nodes.
2. The height of Bk is k.
4. The root has k children and all other nodes have strictly less.
1. B0 has 1 = 20 nodes. Now assume that Bk−1 has 2k−1 nodes (induction
hypothesis). Bk consists of two copies of Bk−1 and has 2 · 2k−1 = 2k
nodes.
4. In B0 , the root has no children and there are no further nodes. Assume
that in Bk−1 , the root has k − 1 children and all other nodes have less
children. The root in Bk has k children. All other nodes have less
children, since they belong to one copy of Bk−1 and no further edges
are added except the one joining the two copies.
Binary trees were easy to store. Every node had three fields containing
pointers/references to the left and right child and the parent. Binomial trees
have an unbounded number of children. We can of course keep a list of all
children at every node. More compact is the so-called left-child-right-sibling
representation: Every node has a pointer/reference to its parent, its left-most
child, and its sibling to the right. If the node is the root, then the parent
1
How does one prove this identity? Either you use brute force or you can do the
following: The right hand side is the number of possibilities to choose i out of k items.
Now mark one of the k items. To take i of the k items, you either can take the marked
one and i − 1 out of the k − 1 remaining items or you do not take the marked one but i
out of the k − 1 remaining ones. But this is precisely the sum on the left-hand side.
B0 B1 B2 B3
P
C S
P P P
C S C S C S
pointer is NULL. If the node is a leaf, then the pointer to the left-most child
is NULL. If it is the last (right-most) child, then the pointer to the right
silbing is NULL. We will also store at each node x its number of children
and denote this quantitiy by Degree(x).
A binomial heap is a set of binomial trees. Every tree in this set has the
structure of a binomial tree but it contains additional data like the key and
a pointer/reference to satellite data. A binomial heap satisfies the binomial
heap property:
8.3 Operations
8.3.1 Make heap
To build a heap, we just initialize H as the empty list. The running time is
obviously O(1).
Algorithm 33 Make-BH
Input: binomial heap H
Output: empties H
Head(H) := NULL (Old-fashioned languages like C/C++ require that
you free the memory)
8.3.2 Minimum
By item 1 of the binomial heap property, the minimum is stored in one of
the roots of the trees. Therefore we only have to search the roots of the
trees. The next algorithm is a simple minimum selection algorithm on lists.
n runs through the list and in x and min, we store the current minimum and
its value. Since a binomial heap contains ≤ log n trees, the running time of
BH-Minimum is O(log n).
Algorithm 34 BH-Minimum
Input: binomial heap H
Output: the minimum element stored in H
n := Head(H)
x := n
if n 6= NULL then
min := Key(n)
n := Next(n)
while n 6= NULL do
if Key(n) < min then
min := Key(n)
x := n
n := Next(n)
return x
8.3.3 Union
The union routine is the heart of binomial heaps. All further operations are
somehow based on this routine. As a subroutine, we need a procedure that
takes two Bk−1 and produces a Bk . We move the references/pointers in the
obvious way.
Algorithm 35 BT-union
Input: the roots x and y of two copies of Bk−1
Output: the trees are linked together to form a Bk with root y
Parent(x) := y
Sibling(x) := Child(y)
Child(y) := x
Degree(y) := Degree(y) + 1
Remark 8.2 If Key(y) ≤ Key(x) and both trees above are heap-ordered,
then the resulting tree is heap-ordered, too.
To union two binomial heaps H1 and H2 , we first merged the two lists
of trees into one list. The trees are ordered with ascending degrees. By
the binomial heap property, every tree Bk occurs at most once in each Hi ,
thus it occurs at most twice in the merged list. We will go through the list
from left to right. Whenever we encounter two copies of Bk , we will merge
them into one Bk+1 by exploiting BT-union. In this way, we will ensure that
left to the current position, every Bk appears at most once and right to the
current position, every Bk appears at most twice. The only exception is the
current position: The Bk in this position may appear three times! How can
this happen when in the beginning, every Bk only appears at most twice?
Well, we can have two Bk−1 followed by two Bk in the merged list. The two
Bk−1 will be merged to one Bk and then there are three Bk ’s in a row. We
leave the first one of them in the list and merge the other two to one Bk+1
in the next step.
There will be three pointers/references in our procedure, prev , cur , and
next. They point to the previous, current, and next tree in the merged list.
We distinguish three cases. The third one splits into two subcases.
Algorithm 36 BH-union
Input: binomial heaps H1 , H2
Output: H1 and H2 merged
H := Make-BH()
Merge the two lists of binomial trees of H1 and H2 into one list with
ascending degrees.
prev := NULL; cur := Head(H); next := Sibling(cur );
while next 6= NULL do
if (Degree(cur ) 6= Degree(next)) or (Sibling(next) 6= NULL) and
(Degree(Sibling(next)) = Degree(cur )) then /* cases 1,2 */
prev := cur
cur := next
else
if Key(cur ) ≤ Key(next) then /* case 3a */
Sibling(cur ) := Sibling(next)
BT-union(next, cur )
else /* case 3b */
if prev = NULL then
Head(H) := next
else
Sibling(prev ) := next
BT-union(cur , next)
cur := next
next := Sibling(cur )
to this, every degree occurs at most twice. Hence the invariant is still fulfilled
after setting cur := next.
Thus, BH-union is correct. It runs in time O(log n) where n is the number
of elements in the output heap, since the heaps only have ≤ log n trees and
we only spend O(1) time in the while loop per tree.
8.3.4 Insert
With the union operation, inserting an element x into a binomial heap H is
very easy: We pack x into a binomial heap and then merge this heap with
H. The overall running time is obviously O(log n).
Algorithm 37 BH-insert
Input: binomial heap H, element x
Output: inserts x into H
H1 := Make-BH()
Parent(x) := Child(x) := Sibling(x) := NULL
Degree(x) := 0
Head(H1 ) := x
H := BH-union(H, H1 )
8.3.5 Extract-min
We saw already that an element with a minimum key can only be a root of a
tree. But if we extract the minimum, then we destroy the binomial tree. The
next lemma states that the remaining parts of the tree still form a binomial
heap. Thus we can merge the resulting two heaps.
Algorithm 38 BH-extract-min
Input: binomial heap H
Output: returns an element with minimum key and removes it from H
Use the Minimum procedure to find a root x with minimum key.
Remove the corresponding tree from H.
H1 := Make-BH()
H1 := list of subtress of x in reverse order
H := BH-union(H, H1 )
return x
8.3.6 Delete
Exercise 8.1 Implement the Decrease-key procedure.
Algorithm 39 BH-delete
Input: binomial heap H, element x in H
Output: x is removed from H
BH-decrease-key(H, x, −∞)
BH-extract-min(H)
Fibonacci heaps provide the same operations as binomial heaps, but with
a better running time for most of the operations. For some of the opera-
tions, this is only achieved on the average in a sense made precise in the
next section. Fibonacci heaps achieve this improvement by being lazy; bi-
nomal heaps ensure that after every operation, the binomial heap property
is fulfilled. Fibonacci heaps, on the other hand, tidy up only once after a
while.1
63
64 9. Fibonacci heaps
time. The amortized costs per operation is therefore t(n) n . Here t(n) is the
number of times we increase an entry of a. We assume that n ≤ 2` .
Assume our counter a is initialized with 0 in all entries. The least signif-
icant bit a[0] is changed every time we perform an increase. a[1], however,
is only changed every second time, namely, when a[0] = 1. a[2] is changed
only every fourth time, when a[0] = a[1] = 1. In general, a[i] is changed only
every 2i th time. Therefore, the total time is
n n
X n X 1
t(n) = b ic ≤ n · b i c ≤ 2n
2 2
i=0 i=0
ai = ci + Φi − Φi−1
(Think of this as the “true costs” plus the “potential difference”.) Then the
sum of the amortized costs is
n
X n
X n
X
ai = (ci + Φi − Φi−1 ) = Φn − Φ0 + ci ,
i=1 i=1 i=1
in other words,
n
X n
X
ci = ai + Φ 0 − Φn .
i=1 i=1
be the number of ones in the counter after the ith increase. We set Φi = ti .
Then Φ0 = 0 and Φi ≥ 0 for all i. We have
Φi − Φi−1 ≤ −ci + 2
because if the ith operations takes ci steps, then ci − 1 ones are replaced by
zeros and one zero is replaced by a one. Thus
ai = ci + Φi − Φi−1 ≤ 2.
With the potential method, it is also very easy to bound the costs of n
increase operations when we start in any counter state, it is simply
2n + Φ0 − Φn ≤ 2n + Φ0 .
Link: Given the roots of two trees, we combine them into one tree making
the root with the smaller key the root of the new tree and the other
root one of its children. Note that the new tree is heap-ordered if both
input trees are. See Figure 9.1 for an illustration.
Unlink: is the opposite of Link. Given a the root r of a tree and one of the
children x of the root, we remove the subtree with root x from the tree
with root r.
Figure 9.1: The link procedure. The link operation takes two trees and makes
the the root with the larger key (on the righthand side) the child of the other
root (on the left hand side). Note that we did not draw the children of the
root on the righthand side. The unlink procedure is the opposite operation.
It removes a particular subtree from the doubly linked list.
Splice: It takes two circular lists of trees and combines them into one cir-
cular list of trees. See Figure 9.2.
t + 2m.
9.3 Operations
We first implement the procedures Insert, Union, Minimum, and Extract-
min. If we only perform these operations, then the trees in the Fibonacci
heaps will always be (unordered) binomial trees and no nodes will be marked.
Nevertheless, we will explicitly write the term mi in the potential, because
everything is also valid if there are marked nodes.
9.3.1 Minimum
The procedure Minimum is easy to implement, we just have to return min.
The actual costs for this are O(1) and there is no change in the potential.
Therefore the amortized costs are O(1), too.
9.3.2 Union
The procedure Union takes the two heaps and splices them together to one
heaps as in Figure 9.2. Then we choose min as the minimum of the two
minima. That’s it! We do not tidy everything up as we did for binomial
heaps. We even do not sort the sizes of the trees. The actual costs are O(1).
There is no change in potential, since the number of trees in the union is the
sum of the number of heaps in the input heaps. Therefore, the amortized
costs are O(1), too.
9.3.3 Insert
To insert an element, we create a heap of size one (using Make-heap, which
we frankly did not implement so far) and then use union. This takes O(1)
time. The new root has one child more, so the potential goes up by one.
The amortized costs are O(1 + 1) = O(1).
9.3.4 Extract-min
Extract-min is the first interesting procedure. We can find the minimum
using min. We unlink the corresponding tree T from the heap F . When we
remove the root of T , we get a ring of trees, which is nothing else than a
Fibonacci heap. If we splice this ring together with F . The following step
is the only time when we tidy up: Whenever we find two trees in the heap
the root of which has the same degree, we link them. Note that if the two
trees are unordered binomial trees, then we will get an unordered binomial
tree again. So after each linking, we still have a ring of unordered binomial
trees.
Algorithm 41 FH-Extract-min
Input: Fibonacci heap F
Output: the minimum is extracted and returned
1: Unlink the tree T with root min from F
2: Remove the root of T to obtain a circular list of trees.
Splice this list with F .
3: As long as there are two roots with the same degree in F , we link them
together.
4: Recompute min and return the old minimum.
How fast can we implement this scheme? Steps 1 ans 2 take time O(1).
It is very easy to implement step 3 by first sorting the trees by size (recall
that a Fibonacci heap is lazy and does not store the trees sorted) and then
proceed in a way similar to the union procedure for binomial heaps. But
Step 3 can be even implemented in time linear in the number of roots after
splicing: We have an array D. D[i] stores a reference/pointer to a root with
degree i if we have found one so far. For each root r, we run the following
program:
1: i := Degree[r]
2: while D[i] 6= NULL do
3: r0 := D[i]
4: D[i] := NULL
5: r := Link(r, r0 )
6: i := i + 1
7: D[i] := r
If D[i] = NULL in the beginning, then we simply set D[i] := r. Oth-
erwise, we link r and D[i]. Then we get a tree with degree i + 1. If it is
the first one of this degree, then we set D[i + 1] := r, otherwise we repeat
the process. Since every root it put into D exactly once, the running time is
linear in the number of roots.
Let d(n) be an upper bound for the degree of any node in any Fibonacci
heap on n nodes. Since every tree currently is an unordered binomial heap,
d(n) ≤ log n. Before we extract the minimum, we have ti−1 roots. After
splicing, we have ≤ ti−1 + d(n) many roots. After linking all roots of the
same degree, we have ti roots. Since all roots have different degrees then,
ti ≤ d(n). The total cost of FH-Extract-min is
O(ti−1 + d(n)).
Algorithm 42 Cascading-cut
Input: Fibonacci heap F , node x
1: if x is not marked then
2: Mark(x)
3: return
4: y := Parent(x).
5: if y 6= NULL then
6: Unmark(x)
7: Unlink T (x)
8: Link T (x) to the cycle of roots of F
9: Update min
10: Cascading-cut(F, y)
Cascading cut is called when a node lost one of its children. Cascading-
cut gets a Fibonacci heap F and a node x. If x is not marked (that is, it did
not loose a child so far), then x is simply marked. If x is marked, then we
remove the tree T (x) from its current tree and link it to the cycle of roots
and x is unmarked. Then we proceed recursively with the parent of x (which
now lost one of its children, namely x) until we reach a root.
9.4.2 Decrease-key
Let x be the node, the key of which shall be decreased. We remove the
subtree T (x) and link it to the cycle of roots. Since x is now a root, we
can decrease its key without violating the heap property. Maybe we need to
update min. Since the former parent of x lost a child, we call Cascading-cut.
Let ci be the number of cuts performed. This includes the initial cut of
T (x) and all cuts performed by Cascading-cut. The costs of Decrease-key
is proportional to ci . After calling Decrease-key, the number of trees in the
cycle of roots increases by ci , there is one new tree for every cut. Therefore
ti = ti−1 + ci .
Furthermore, at most one node is newly marked and ci − 1 nodes are un-
marked. Hence,
mi = mi−1 − (ci − 1) + 1 = mi − ci + 2.
Algorithm 43 FH-decrease-key
Input: Fibonacci heap F , node x, key k with k ≤ Key(x)
Output: Key(x) is set to k
1: y = Parent(x)
2: Key(x) := k
3: if y 6= NULL then
4: Unlink the tree T (x)
5: Link T (v) to the cycle of roots
6: Update min
7: if y 6= NULL then
8: Cascading-cut(F, y)
ai = ci + Φi − Φi−1 = 4
and thus they are constant. (Because we choose the factor 2 in the term
2mi , we get a constant here!)
9.4.3 Delete
Delete is very similar to Decrease-key. Let x be the node to be deleted. We
first remove T (x) as before. Instead of adding T (x) to the cycle of roots,
we first remove x and splice the cycle of the children of x with the cycle of
roots of F . We update min. Since the former parent of x lost a child, we
call Cascading-cut.
Let ci be the number of cuts performed. This includes the initial cut
of T (x) and all cuts performed by Cascading-cut. The costs of Delete is
proportional to d(n) + ci . d(n) for recomputing min 3 , and ci for performing
the cuts. After calling Delete, the number of trees in the cycle of roots
increases by d(n) + ci , there is one new tree for every cut. Therefore
ti ≤ ti−1 + d(n) + ci .
Algorithm 44 FH-delete
Input: Fibonacci heap F , node x
Output: x is deleted
1: if x = min then
2: Fh-extract-min(x)
3: else
4: y = Parent(x)
5: Unlink the tree T (x)
6: Remove x
7: Splice the cycle of children of x with the cycle of roots of F
8: Update min
9: if y 6= NULL then
10: Cascading-cut(F, y)
Theorem 9.2 For a Fibonacci heap, the operations Minimum, Union, In-
sert, and Decrease-key take constant time, the first three even in the worst
case. The operations Extract-min and Delete take amortized time O(log n).
Pk
Exercise 9.1 Prove that Fk+2 = 1 + i=0 Fi for all k. (Hint: Induction on
k)
Corollary 9.5 d(n) ∈ O(log n) for all Fibonacci heaps with n nodes
We solved recurrence relations twice, when analyzing merge sort and the
linear time selection algorithm. Everything we learned there is essentially
sufficient to show the following general purpose theorem. It is usually called
“master theorem” (but should be called “slave theorem” since it encourages
you to turn your brain off).
You should know the theorem and should be able to apply it. The proof
was not presented in the lecture and is not relevant for the exam.
1. If f (n) = O(nlogb a− ) for some > 0, then t(n) = O(nlogb (a) ).
3. If f (n) = Ω(nlogb a+ ) and a f (dn/be) ≤ df (n) for some constant d < 1
and all sufficiently large n, then t(n) = O(f (n)).
Proof overview: Consider the recursion tree. At the root, we have work
f (n) to do. The root has a children. At every children, we have work roughly
f (n/b), given a total work of af (n/b). In general, at recursion depth i, the
total work amounts to ai f (n/bi ).
The function f (n) = nlogb a fulfills nlogb a = a(n/b)logb a . For this function,
the total work at every recursion level is the same, namely nlogb a . There are
logb n levels yielding a total amount of nlogb a logb n.
If f grows (polynomially) slower than nlogb a , then the work at thes leaves
dominates, since the work increases geometrically. There are alogb n = nlogb a
leaves. The work at each leaf is constant.
If f grows (polynomially) faster than nlogb a , then the work at the root
dominates. It is f (n).
Exercise 10.1 Show that if f fulfills f (dn/be) ≤ df (n) for some constant
d < 1 and all sufficiently large n, then f (n) = Ω(nlogb a+ )
73
74 10. Master theorem
2. If f is monotone, so is t.
Exercise 10.2 Proof the lemma above (Hint: standard induction on n).
We have n e n e
γne = γbe =a
b b
In the first case, the recurrence becomes
t̂(n) ≤ t̂(n/b) + 1
Using a proof similar to the one of Lemma 1.13, we get t̂(n) = O(log n).
Hence, t(n) = O(nlogb a log n).
t(n) t(dn/be)
≤a· +1
f (n) f (n)
t(n)
Let t̂(n) = f (n) . By assumption, f (n) ≤ a·f (dn/be)/d. Thus, the recurrence
becomes
t̂(n) = d · t̂(dn/be) + 1.
As in the proof of Lemma 4.5, we get t̂(n) = O(1). Thus t(n) = O(f (n)).
Exercise 10.3 Show that if s(n) = s(n/b) + n− for all n ∈ B \ {1} and
s(1) = 1, then s(n) = O(1). (Hint: Show by induction on n, that s(n) =
Plog n−1 i
n i=0b b .)
In this chapter, we will learn two algorithm design techniques that are well-
suited for optimisation problem with a certain “substructure property”.
76
11.1. Dynamic programming 77
These two equations immediately yield an algorithm: Run over all sets U ,
starting with the sets of size 1, then of size 2, and so on. For each set U with
|U | > 1 run over all subsets I and check whether it is independent. If yes,
then check whether C(G[U \ I]) + 1 is smaller than the smallest number of
colors we found so far.
the number of subsets I of a set U of size i. (We have included the empty
set to get a nicer formula for the running time.)
11.1.3 Memoization
The dynamic programming approach is bottom-up. If we want to go top-
down by making recursive calls, we can use a technique called memoization.
We have a global array C(G[U ]). Whenever we compute C(G[U ]) during
a recursive call, we store the result in C(G[U ]). Before making a recursive
call, we look in C(G[U ]) whether this value has already been computed. This
avoids unnecessary recursive calls. This essentially fills the table T top-down
instead of bottom-up.
Algorithm 45 Greedy-scheduling
Input: a set of intervals with starting and finishing times si ≤ fi , 1 ≤ i ≤ n
Output: a subset T of non-intersecting intervals that is as large as possible
1: T := ∅
2: while there is an interval that does not intersect any interval in T do
3: Choose the non-intersecting interval I that has the smallest finishing
time.
4: T := T ∪ {I}
5: return T
C is the maximal number of intervals that overlap in a single point. All these
intervals are connected with an edge in the interval graph, therefore, each
of these intervals needs a different color. Therefore, every proper coloring of
an interval graph needs at least C colors.
On the other hand, there is always a coloring with C colors and we can
find it by a greedy algorithm. We consider the intervals by ascending starting
time and always give the next interval a color that does not create a conflict.
Algorithm 46 Greedy-coloring
Input: an interval graph G = V, E
Output: a proper coloring of the nodes in V
1: sort the intervals by starting time.
2: initially, all intervals are uncolored
3: for j = 1, . . . n do
4: Let F be the colors of intervals from I1 , . . . , Ij−1 that intersect Ij .
5: Choose a color from {1, . . . , C} \ F and color Ij with this color.
Since we sort the intervals by starting time, it is also very easy to keep
track of the intervals that intersect with Ij . In this way, we get an algorithm
with O(n log n) running time.
The algorithm will never color intervals that are connected by an edge
with the same color. It might happen that an interval remains uncolored,
since there are no colors left in {1, . . . , C} \ F . Can this happen? Since
the intervals are sorted by starting time, all the intervals that intersect Ij
intersect in the point sj . By the definition of C, #F < C. Therefore, Ij is
properly colored.
A(n undirected) graph G is a pair (V, E), where V is a finite set, the set
of nodes or vertices, and E is a set of two-element subsets (= unordered
pairs) of V , the so-called edges. If e = {u, v} ∈ E, we say that u and v are
connected by the edge e or that u and v are adjacent. The edge e is called
incident on u and v.
In a directed graph G = (V, E), edges are ordered pairs, i.e., E ⊆ V × V .
We say that an edge e = (u, v) leaves u and enters v or is incident from u
and incident to v. e is called an edge from u to v. An edge of the form (v, v)
is called a self-loop. We can also allow self loops in undirected graphs, they
are sets of size one. Note that in a directed graph, there can be two edges
between two nodes, one from u to v and the other one from v to u.
A sequence v0 , v1 , . . . , vk is called a walk from v0 to vk in G = (V, E) if
{vi , vi+1 } ∈ E for all 0 ≤ i < k. (In the case of a directed graph, (vi , vi+1 ) ∈
E, that is, all edges in the path have to point “in the same direction”.) A
walk is called a (simple) path if all nodes in the path are distinct. k is called
the length of the path. (Some books use path as a synomym for walk and
simple path for path.)
Exercise 12.1 Show that if two vertices are connected by a walk, then they
are connected by a path.
Exercise 12.2 Show that in undirected graph, the “is reachable from” rela-
tion is an equivalence relation.
82
12.1. Data structures for representing graphs 83
huge sparse graphs (think of web pages as nodes and links as edges). Here
adjacency-matrices are way too large.
Both representation can be easily extended to weighted graphs. A weighted
graph G = (V, E, w) has in addition a function w : E → Q that assigns each
edge a weight. In the adjacency-list representation, we simply store the
corresponding weight w(i, j) together with j in the list. In the adjacency-
matrix, we just store the weight w(i, j) instead of 1. If an edge does not
exist, we can, depending on the application, either store 0 or ∞ or a special
value indicating that there is no edge.
Breath first search and depth first search are two important way to explore a
graph and they are the basis of many other graph algorithms. We are given
a graph G = (V, E) and a node s ∈ V , the start node or source. We want
to explore the graph in the sense that we want to discover all nodes that are
reachable from s by a path.
We start in s. The nodes that we first can discover from s are the nodes
v such that there is an edge (s, v) ∈ E, the so-called neighbours of s.1 Then
we can discover the neighbours of the neighbours of s and so on until we
discover all nodes reachable from s.
Breadth first search first explores all nodes that are close to s. Nodes
have one of three possible states: undiscovered, discovered, finished. In the
beginning, s is discovered and all other nodes are undiscovered. A node
becomes discovered if we encounter it for the first time. Then its state
changes to discovered. If all neighbours of a node have been discovered, its
state finally changes to finished.
Breadth first search first marks every node v as undiscovered , the distance
d[v] to s is set to ∞, and the predecessor p[v] is set to NULL. Since the only
node that we know so far is s, we set state[s] = discovered and d[s] := 0.
We initalize a queue Q with s, Q will contain all discovered nodes that are
not finished. As long as Q is not empty, we remove the next node v from it.
All its neighbours that are have not been discovered are marked discovered
and are put into the Q. v becomes the predecessors of these nodes and their
distance is set to d[v] + 1. Since all of its neighbours are discovered, v is
marked finished .
1
Breadth first search works for undirected and directed graphs. We only present it for
directed graphs, since every undirected graph can be represented by a directed graph by
replacing each undirected edge {u, v} be two directed edges (u, v) and (v, u).
12.2.1 Correctness
Let G = (V, E) be a graph. The shortest path distance δ(x, y) of two nodes
x, y ∈ V is
for all x, y, z ∈ V .
δ(s, y) ≤ δ(s, x) + 1
Lemma 12.2 If BFS is executed on G = (V, E), then all vertices of Ui are
inserted into the queue before all vertices in Ui+1 and after all vertices in
Ui−1 , if i > 0.
Now we apply the argument above for j = i. Nodes from Ui+1 can only
be put into the queue when we remove vertices from Ui . When we remove
the first vertex of Ui , only vertices from Ui are in the queue and we start
putting the vertices from Ui+1 into the queue. Since every vertex in Ui+1 has
a neighbour in Ui , all vertices of Ui+1 are in the queue after the last vertex
of Ui is removed.
Theorem 12.3 BFS works correct, that is, after termination, d[v] = δ(s, v)
for all v ∈ V . Moreover, for every node v with δ(s, v) < ∞, a shortest path
from s to p[v] together with the edge (p[v], v) is a shortest path from s to v.
Note that d[v] = δ(s, v) is sufficient to prove the correctness, since a node
v is marked as finished iff d[v] < ∞ and if a node is marked as finished, then
p[v] contains the node v was discovered from.
Proof. If v is not reachable from s, then v is never put into the queue,
hence d[v] = ∞.
We now show by induction on i that d[v] = i for all v ∈ Ui .
Induction base: : This is true by construction for U0 .
Induction step: : Assume that d[v] = i for all v ∈ Ui . All nodes u in Ui+1
are put into the queue by a node in Ui by Lemma 12.2. This means that
d[u] = i + 1. This completes the induction step.
Finally note that for every node v ∈ Ui+1 , p[v] ∈ Ui . Therefore a shortest
path to p[v] has length i. If we extend this path by (p[v], v), then we get a
path of length i + 1, which must be a shortest path.
The proof of the corollary follows from repeatedly using the “moreover”-
part of Theorem 12.3. Such a tree G0 is called a shortest path tree.
Proof. W.l.o.g. assume that d[x] < d[y]. If f [x] < d[y] then the intervals
are disjoint and we are done.
Algorithm 49 DFS-explore
Input: a graph G
Output: every node is visited
start and finishing times are computed
1: for each v ∈ V do
2: state[v] := undiscovered
3: p[v] := NULL
4: t := 0
5: for each x ∈ V do
6: if state[x] = undiscovered then
7: Depth-first-search(x)
If d[y] < f [x], then x’s state is discovered when y is discovered. Therefore
x is a predessor in the recursion tree of y. Therefore f [x] < f [y].
90
13.1. Data structures for (disjoint) sets 91
Union(x, y): Computes the union of the two sets that contain x and y,
respectively. We assume that x and y are in different sets. The old
sets are destroyed.
Find(x): Finds the representative of the unique set that x belongs to. If x
is not in any set, then NULL is returned.
Algorithm 50 Kruskal-MST
Input: connected weighted undirected graph G = (V, E, w)
Output: a minimum spanning tree T of G
1: Sort the edges by increasing weight
2: for each vertex v ∈ V do
3: Make-set(v)
4: ET := ∅
5: for each edge e = {u, v} ∈ E, in order by increasing weight do
6: if Find(u) 6= Find(v) then
7: ET := ET ∪ {e}
8: Union(u, v)
9: return (V, ET )
a single vertex. Then trees are joined until in the end, we have a single tree.
We use a disjoint set data structure to store the intermediate forests.
Algorithm 51 Prim-MST
Input: connected weighted undirected graph G = (V, E, w)
Output: a minimum spanning tree T of G
1: Choose an arbitrary root r ∈ V .
2: Key[r] := 0; Key[v] := ∞ for all v ∈ V \ {r}.
3: Let Q be a min-priority queue filled with the vertices in V .
4: p[r] := NULL
5: while Q is not empty do
6: x := Extract-min(Q)
7: if p[x] 6= NULL then
8: ET := ET ∪ {p[x], x}
9: for each vertex y adjacent to x do
10: if y ∈ Q and w({x, y}) < Key[y] then
11: p[y] := x
12: Key[y] := w({x, y})
13: return (V, ET )
If it becomes adjacent to some vertex x, then its key becomes the weight of
w({x, y}). If it becomes a adjacent to a new vertex, then we only set the key
to the new weight if this will decrease the key. (In particular, we can always
use the Decrease-key procedure for this.) In this way, Key[x] is the cost of
adding x to the tree grown so far. In p[y] we always store the vertex x such
that Key[y] = w({x, y}).
What do we achieve by this? Well, Prim’s algorithm is supposed to work
as follows: We always choose the minimum weight edge such that extends the
current tree, that is, joins a new node to the tree. If x := Extract-min(Q),
then the edge {p[x], x} will be precisely this edge. By using a priority queue,
we get a efficient implemenation of this process. The only exception is in
the beginning, when we extract the root r from the queue. In this case, we
do not add an edge to the tree.
Why is Prim’s algorithm correct? The edge chosen is always a minimum
weight edge that crosses the cut consisting of the vertices chosen so far
and the remaining vertices in the Q. Now, the same proof as the one of
Theorem 13.2 works.
Assume that we implement the priority queue using binary heaps. The
initialization needs time O(|V |). The outer for-loop is executed |V | times.
An Extract-min operation needs time O(log |V |), so the cost of all Extract-
min operations is O(|V | log |V |). The inner for-loop is executed |E| times
altogether. The test y ∈ Q can be implemented in constant time by using
a Boolean array. The Decrease-Key operation has running time O(log |V |).
Thus the overall running time is O(|V | log |V | + |E| log |V |) ⊆ O(|E| log |V |).
Note that log |V | ∈ Θ(log |E|), so this is not an improvement over Kruskal’s
algorithm (at least not theoretically). However, if we use Fibonacci heaps to
Exercise 14.2 Show that for any three nodes u, v, and x, δ(u, v) ≤ δ(u, x)+
w((x, v)).
95
96 14. Shortest paths
Algorithm 52 Relax
Input: nodes u and v
Output: d[v] and p[v] are updated
if d[v] > d[u] + w((u, v)) then
d[v] := d[u] + w((u, v))
p[v] := u
All the statements are true in the beginning. The first one can be easily
proven by induction. A call of Relax(u, v) only changes d[v]. When d[v] is
changed, then d[v] ≤ d[u] + w((u, v)) now holds. When d[v] is not changed,
then d[v] ≤ d[u] + w((u, v)) was true even before.
For the second statement, assume that the call of Relax(u, v) is the first
call such that d[v] < δ(s, v) for some node v. Then
The last inequality follows from Exercise 14.2. Therefore, d[u] < δ(s, u)
which contradicts the assumption that Relax(u, v) was the first call with
d[v] < δ(s, v).
The third statement is clear by the property of Relax.
For the fourth statement, note that after executing Relax(u, v), then d[v]
is δ(s, u) + w((u, v)) which is the weight of a shortest path from s to v.
Algorithm 53 Dijkstra
Input: edge-weighted directed graph G = (V, E, w), w(e) ≥ 0 for all e,
source s ∈ V
Output: d[v] = δ(s, v) for all v and p encodes a shortest path tree
1: Initialize d[s] = 0, d[v] = ∞ for all v 6= s, and p[v] = NULL for all v.
2: Let Q be a min-priority queue filled with all vertices from V using d[v]
as keys.
3: while Q is not empty do
4: x := Extract-min(Q)
5: for each y with (x, y) ∈ E do
6: Relax(x, y)
Note that the Relax procedure now involves a Decrease-min call. Before
we come to the correctness, let us first analyze the running time. If we imple-
ment Q by an ordinary array, then the Insert and Decrease-min operations
that time O(1) while Extract-min takes O(|V |). Every node is inserted and
extracted once. For every edge, we have one Decrease-key (= Relax) opera-
tion. Thus, the total running time is O(|V |2 + |E|) which is linear in the size
of the graph if the graph is dense, that is, |E| = Θ(|V |2 ). If we implement Q
with binary heaps, then every operation takes time O(log |V |) giving a total
running time of O(|V | log |V | + |E| log |V |). And finally, we could also use
Fibonacci heaps; then we get a running time of O(|V | log |V | + |E|).
The correctness of Dijkstra’s algorithm follows immediately from the fol-
lowing lemma.
Lemma 14.1 For all v that are extracted from Q, d[v] = δ(s, v).
Proof. The first node that is extracted is s. We have d[s] = 0 = δ(s, s).
Now let u be the first node that will be extracted such that d[u] 6= δ(s, u).
We have u 6= s. By property 2 in Section 14.1, d[u] > δ(s, u).
Algorithm 54 Bellman-Ford
Input: edge-weighted directed graph G = (V, E, w), source s ∈ V
Output: d[v] = δ(s, v) for all v and p encodes a shortest path tree
1: Intialize d and p
2: for i := 1, . . . , |V | − 1 do
3: for each e = (u, v) ∈ E do
4: Relax(u, v)
5: for each (u, v) ∈ E do do
6: if d[v] > d[u] + w((u, v)) then
7: error “negative cycle”
Often we only want to support the three basic operations insert, search, and
delete. This is called a dynamic set or dictionary problem.
All these operations should run extremely fast, that is, O(1). And they
should be even fast in pratice.
h : U → {0, . . . , m − 1}.
99
100 15. Hash tables
Algorithm 55 CH-insert
Input: hash table T , element x
Output: inserts x into T
1: insert x at the head of T [h(Key(x))].
Algorithm 56 CH-search
Input: hash table T , key k
Output: returns an element with key k if one exists, NULL otherwise
1: search for key k in T [h(k)]
Algorithm 57 CH-delete
Input: hash table T , element x
Output: deletes x from T
1: delete x from the list T [h(Key(x)]
Deletion can be done in time O(1), if we assume that the lists are doubly
linked. Note that we assume that the element x is given. If we only have
the key, we first have to search for the element.
The total space requirement for T is O(m + n), the number of slots in T
plus the number of elements stored in the table.
The worst case running time for searching, however, is O(n), which is
inacceptable. The worst case occurs if all elements hash to the same entry
T [i]. This can indeed happen, since by the pigeon hole principle, there is an
i such that dN/me keys hash to T [i].
But can this happen often? Of course, there are degenerate hash func-
tions h that map all keys to the same slots T [i]. We assume that h distributes
the keys evenly among the table, that is, #{k ∈ U | h(k) = i} ≤ dN/me.
Lemma 15.1 Assume that h distributes the keys evenly among the table. If
a random set S ⊆ U of size n is stored in T , then the expected number of
elements in any entry of T is ≤ d N n
me · N .
The quantity d N n n
m e · N is approximately m . This is called the load factor.
Under the assumption of the lemma above, then running time of searching
n
is O(1 + m ). If the number of slots in T and the number of elements are
linearly related, i.e., n = Θ(m), then searching can be done in constant time.
Finding a good hash function is an art, since inputs usually are not
random. So hash functions in practise often include some knowledge of the
input distribution.
1. h(k) = k mod m
This function distributes the keys evenly. If m = 2t is a power of two,
then it is often a bad hash function, since h(k) are just the lower order
bits of k. If m is a prime “not too close to a power of two”, then it
usually performs well.
1
Pr [h(x) = h(y)] ≤ .
h∈H m
103
104 16. More on hashing
and " #
X X
E[X] = E Yk = E[Yk ].
k∈S k∈S
The last equality follows from the fact that the expected value is linear.
Since Yk is an indicator variable,
(
1
≤m if k 6= x
E[Yk ] = Pr [Yk = 1] = Pr [h(k) = h(x)]
h∈H h∈H = 1 if k = x
The last equation follows from the fact that H is universal. Thus,
X n−1
E[X] = E[Yx ] + E[Yk ] ≤ 1 + .
m
k∈S\{x}
While the set of all functions forms a universal family, it is a bad family for
practical purposes: The family is hugh; it has size mN . Furthermore, there
is no better way than representing the functions by a table of its values. A
good family of universal should have the following properties.
We have
n
X
h(x) = h(y) ⇐⇒ c1 = ci (xi − yi ).
i=2
Pn
How many matrices M fulfill c1 = i=2 ci (xi − yi )? The righthand side
determines c1 completely. HencePout of the 2µ possible choices for c1 , only
one will be a solution of c1 = ni=2 ci (xi − yi ). Consequently, Pr[h(x) =
1
h(y)] = 21µ = m .
To choose a random function form Hlin , we just have to fill a matrix with
0 and 1 chosen randomly. We can evaluate the hash function by multiplying
the matrix with the key we want to hash.
We assume that for all keys k ∈ U , the sequence h(k, 0), h(k, 1), . . . , h(k, m−
1) is a permutation of the number 0, 1, . . . , m − 1. We first try to store the
element in T [h(k, 0)]. If this field is occupied, we try T [h(k, 1)] and so on.
For this reason, the sequence h(k, 0), h(k, 1), . . . , h(k, m − 1) is also called a
probing sequence. If we do not find an empty field, we claim that the table
is full.
The procedure Insert and Search can be easily realized as follows.
Algorithm 58 OA-Insert
Input: hash table T , key k
Output: inserts k into T , if T is not full
1: i := 0
2: j := h(k, 0)
3: while i < m do
4: if T [j] = NULL then
5: T [j] := k
6: return
7: else
8: i := i + 1
9: j := h(k, i)
9: error “Table full”
Algorithm 59 OA-Search
Input: hash table T , key k
Output: j if k is stored in T [j], NULL if no such j exists
1: i := 0
2: j := h(k, 0)
3: while i < m and T [j] 6= NULL do
4: if T [j] = k then
5: return j
6: else
7: i := i + 1
8: j := h(k, i)
8: return NULL
tion. We set
h(k, i) := h0 (k) + i mod m.
h(k, 0), . . . , h(k, m − 1) is a permutation of 0, . . . , m − 1, so h meets
our requirements. In fact, h(k, 0), . . . , h(k, m − 1) is a cylic shift. This
means that altogether, h produces at most m different probing se-
quences.
A big problem of linear probing is primary clustering: Assume that h0
distributes the keys evenly among the entries of T . Consider an empty
table and we insert an element. Since h0 distributes the keys evenly,
every cell of T has the same chance of getting the element. Let i be the
cell to which the element goes. Now we add a second element. Every
cell of T again has a 1/m chance of getting the element except for one,
i + 1. This cell has a chance of 2/m, since the element goes there if it
goes to cell i (which is occupied) or cell i + 1. In general, if j cells are
occupied before a particular cell, then the probability that an element
will be stored in this cell is (j + 1)/m. Such long runs are bad since
they increase the running time of search.
In this case, we can choose h such that the total table length is ≤ 4n. It is
sufficient to show the following lemma. (Use Markov inequality! Or do it on
foot: If for at least half of the hash functions, the length is > 4n, then the
expected length is > 12 · 4n = 2n, a contradiction.)
We have
"n−1 #
X XX
E m2i = E Zx,y ,
i=0 x∈S y∈S