What Is An "Algorithm?"
What Is An "Algorithm?"
• What is an algorithm?
• Termination and correctness.
• Non-existence result.
• Time- and space-efficiency, asymptotics
• An(other) example: analysis of binary search.
What is an “algorithm?”
107
108
f : Z × Z → Z, where f : ha, bi 7→ a + b
We can certainly devise and express an algorithm that computes the above
f.
allows multiple instances of the same item. The set S Z denotes the set of all
multisets of integers.
0 if i 6∈ S
f : S × Z → {0, 1},
Z
f : hS, ii 7→
1 otherwise
• an integer, i,
isIn(S, i)
Sum(i, j)
1 foreach j ∈ S do
1 r ← i+j
2 if i = j then return 1
2 return r
3 return 0
terminate, where the assumption that every entry of A and i are finite is part
of the argument for Line (2). Thus, the algorithm possesses the termination
property.
We may consider algorithms that do not meet the termination condition.
Consider, for example, the following randomized algorithm to determine the
index of the median of an input array of n distinct integers, where n is
odd. As we discuss below, randMedian does not possess the termination
property, or “does not terminate,” but is correct.
RandMedian(A[1, . . . , n])
1 while true do
2 i ← uniformly random choice from 1, . . . , n
3 c←0
4 foreach j from 1 to n do
5 if A[j] < A[i] then c ← c + 1
6 if c = n−1
2
then return i
For Case (a), the only line in which we return 1 is Line (2). The return of
1 is preceded by the check “i = A[j].” Furthermore, the foreach statement
in Line (1) guarantees that j ∈ {1, . . . , n} at any moment that we check the
condition “i = A[j]” in Line (2). Thus, if we return 1, then we do so in Line
(2). Which is true only if i = A[j] at the moment that we return, which is in
turn checked only if j ∈ {1, . . . , n} at that moment. Thus, we have proven
Case (a).
For Case (b), we first observe that the only line in which we return 0 is
in Line (3). And we reach Line (3) only if we exhaust the foreach loop,
i.e., do not return in Line (2). If we indeed return in Line (3), this implies
that we have exhausted the foreach loop, i.e., not returned in Line (2) for
any j ∈ {1, . . . , n}. This in turn implies that we have checked the condition
“i = A[j]” for every j ∈ {1, . . . , n} because that condition is checked in every
iteration of the foreach loop. Thus, if we return in Line (3), then for every
j ∈ {1, . . . , n}, i 6= A[j], and we have proven Case (b).
Note: the above proof really does make a mountain out of a molehill, i.e., be-
labours a rather straightforward proof. I do this for illustration/instructional
purposes. You do not have to be so elaborate in your proofs. Just ensure
that your logical reasoning is sound.
Proof. The only line in which we return is Line (6). And the i that we
return is the index we chose in Line (2) in that iteration of the while loop.
We observe that we return i only if c = n−1 2
at the moment we return i. Also,
based on Lines (3)–(5), c is the number of entries in A[1, . . . , n] each of which
is < A[i]. Thus, at the moment we return i, we know that exactly (n − 1)/2
entries in A[1, . . . , n] are smaller than A[i]. As n is odd and each entry in
113
For example, in the following graph, which is from CLRS, each of the sets
of dark and light vertices is a vertex cover. That is, {u, v, y, x} is a vertex
cover, and so is {z, w}. Of course, any superset of a vertex cover is also a
vertex cover.
But the bigger question is: does the algorithm output a minimum-sized vertex
cover? And the answer is, ‘no, not necessarily.’ It is easy to produce an input
G for which it does not. Consider, for example, V = {u0 , u1 , . . . , u2n−1 }, for
some positive integer n, and E = {hu2i , u2i+1 i | i ∈ [0, n − 1]}. Then, the
algorithm outputs C = V , i.e., a vertex cover of size 2n. But we know that
a minimum-sized vertex cover is of size n. E.g., {u0 , u2 , u4 , . . . , u2n−2 } is a
minimum-sized vertex cover. Note that the above is a class of inputs, with
infinitely many instances of input in it, and this class has inputs of whatever
size we want.
So then, what good is this algorithm? It turns out that the vertex cover prob-
lem is one that is highly unlikely to lend itself to an efficient algorithm. (We
have not characterized “efficient” or “highly unlikely” yet, but we will.) The
above algorithm, ApproxVertexCover, however, is simple, and highly
efficient. All we have to do is scan the input graph once. Furthermore, even
though the algorithm is sub-optimal, we can upper-bound how bad it can
get, for every input.
Claim 22. Suppose c∗ is the minimum size for a vertex cover of an undirected
graph G. Then, the output C produced by ApproxVertexCover on input
G is guaranteed to be such that: |C| ≤ 2c∗ .
Proof. Consider every edge, hu, vi, that we pick in Line (3). At least one
of u or v must be in every vertex cover. Thus, the size of the output from
115
(Non-)existence of algorithms
• A string, call it x.
Output:
• false, otherwise.
input is a string that is the encoding of some algorithm A, and the second
input is a string x that is intended to be an input to A. B outputs true if
A is guaranteed to halt, when run with input x, and false otherwise. Now
consider the following two algorithms, C and D.
C(x, y) D(z)
1 if B(x, y) = false then return 1 C(z, z)
2 else go into infinite loop
• B(D, D) = false =⇒ D(D) does not halt =⇒ C(D, D) does not halt
=⇒ B(D, D) = true.
Algorithm efficiency
Correctness and termination (and existence) are of course rather basic and
essential properties. Once we address those, efficiency may be of concern.
In this course, we will discuss two kinds of efficiency: time-efficiency, and
117
n
X
X= k × Yk
k=1
n
X
=⇒ E[X] = k × E[Yk ]
k=1
Xn
= k × Pr{Yk = 1}
k=1
n
1X
= k
n k=1
n+1
=
2
Things can get more interesting when the sub-routine calls are recursive.
Consider, for example, the following version of insertion sort, an algorithm
to sort an array of items from a totally ordered set, e.g., integers, which we
call RecursiveInsertionSort.
RecursiveInsertionSort(A, n)
1 if n = 1 then return
2 RecursiveInsertionSort(A, n − 1)
3 foreach j from 1 to n − 1 do
4 if A[n] < A[j] then
5 tmp ← A[n]
6 foreach k from n downto j + 1 do
7 A[k] ← A[k − 1]
8 A[j] ← tmp
Asymptotics
Suppose we have two algorithms for a problem, one, call it Algorithm (A),
which runs in time 2n + 20 on input-size n, and another, call it Algorithm
(B), which runs in time n2 . A plot of these two functions is shown above in
a picture from Dasgupta et al., “Algorithms.” Of course, we consider input
size n ∈ Z+0 only. And we observe that for n ≤ 5, Algorithm (B) performs
better; otherwise Algorithm (A) does.
Thus, if we ask which of algorithm (A) or (B) is better, the answer is “it
depends” on the size of the input. Note that this is different from the case
that the functions that capture time- or space-efficiency are more “similar”
to one another; for example, it is easier to compare the functions n2 and 2n2
because we can, without qualification, say that an algorithm whose running-
time is the former performs better than an algorithm whose running-time is
the latter. Thus, one of our motivations is that we seek a meaningful way to
compare such “different looking” functions.
Another motivation is “bang for buck,” or the relative pay-off we get from
investing in a better computer. We explain this motivation after we introduce
the mindset behind what I call asymptotics to compare functions.
The mindset behind asympototics is to ask how well an algorithm behaves for
large n. That is, a kind of scalability question: how well does the algorithm
124
scale with large input sizes? Under such a mindset, it is easy to observe that
Algorithm (A) is more time-efficient than Algorithm (B). This is exactly the
mindset that asymptotics, i.e., “in the limit,” captures.
We now explain what this has to do with pay-off or “bang for buck” as I say
above. We ask: for which of algorithms (A) and (B) do we get a better pay-
off if we were to invest in a better computer? Suppose we fix our time-budget
at t time-units. We first ask, for each of algorithms (A) and (B), what size
inputs we can handle in time t.
For Algorithm (A), we have 2na +20 ≤ t ⇐⇒ na ≤ t−20 . And for Algorithm
2
√ 2
(B), we have nb ≤ t ⇐⇒ nb ≤ t. Now suppose we buy resources (e.g.,
CPU, RAM) for our computer that causes it to be able to execute twice as
many instructions in the same time as before. This is equivalent to saying
that the same time-budget as before is now 2t. For Algorithm (A), we are
now able to handle, in the same time-budget as before, an input of size
n0a ≈ 2na . For Algorithm (B), we are able to handle
√ a new input size in the
same time budget of n0b ≈ 1.4nb , where 1.4 ≈ 2. Thus, the pay off with
Algorithm (A) for our investment in the computer is better, and therefore
we should deem Algorithm (A) to be the better algorithm.
This is exactly the mindset that a comparison of functions based on asymp-
totics captures, and is behind the O(·), Ω(·), Θ(·), o(·) and ω(·) notation to
compare functions to one another. We now define these. Before we do
that, we adopt some assumptions. We deal only with functions of the form
f : N → R+ . That is only those functions whose domain is positive integers,
and co-domain is positive reals. The reason is that an input size for us is
always a positive integer, and time- and space-efficiency is quantified as a pos-
itive real. Furthermore, we consider only functions that are non-decreasing;
that is, a, b ∈ N with a ≥ b =⇒ f (a) ≥ f (b). Again, this is because we
do not expect that time- and space-efficiency improves with increasing input
size.
Thus, henceforth in this context, when we say “function,” we mean a function
that satisfies the above assumptions.
50n log2 n
For example, lim 2
= 0, and therefore 50n log2 n = O(2n2 ). But,
n→∞ 2n
2n2
lim = ∞, and therefore 2n2 6= O(50n log2 n). Similarly, 2n + 2 =
n→∞ 50n log2 n
O(n2 ).
12n2 − 5
As another example, lim = 12, and therefore 12n2 −5 = O(n2 +2).
n→∞ n2 + 2
n2 + 2 1
And lim 2
= , and therefore n2 + 2 = O(12n2 − 5).
n→∞ 12n − 5 12
The mindset, when we say f (n) = O(g(n)) is that g(n) is a kind of upper-
bound for f (n). The following alternative definition for O(·), which can be
seen as the discrete version of Definition 3.
For example, 12n2 − 5 = Θ(2n2 + 1). But 12n2 − 5 6= Θ(50n log2 n). The
following figure from CLRS illustrates the notions quite nicely.
Before we move on, we introduce two more notions, o(·), and ω(·). These are
upper- and lower-bounds, respectively, that are not tight.
f (n)
Definition 8 (o(·)). f (n) = o(g(n)) if lim = 0.
n→∞ g(n)
o(·) is the special case of O(·) in which the limit is exactly 0. The point,
when we say f (n) = o(g(n)) is to say that g(n) is an asymptotic upper-
bound for f (n), but that is not tight. That is, g(n) is “strictly greater than”
f (n). For example, 12n2 − 5 = O(2n2 + 1), but 12n2 − 5 6= o(2n2 + 1). And
50n log2 n = O(n2 ), and 5n log2 n = o(n2 ).
g(n)
Definition 10 (ω(·)). f (n) = ω(g(n)) if lim = 0.
n→∞ f (n)
Definition 11 (ω(·), alternative). ω(g(n)) = {f (n) | for every constant c ∈
R+ , there exists n0 ∈ N such that for all n ≥ n0 , 0 ≤ c · g(n) < f (n)}.
Claim 24. f (n) = o(g(n)) ⇐⇒ g(n) = ω(f (n)).
Classes of functions
Across classes
√ √
E.g., 30
n = ω(log2 n). Which means also: log2 n = o( 30 n).
Claim 27. For constants a > 1, c > 0, c = o(loga n).
Within classes
Claim 28. Let p(n), q(n) be polynomial functions (that satisfy our assump-
tions) of the same degree. Then, p(n) = Θ(q(n)).
Claim 29. Let p(n) be a polynomial of some degree, d1 , and q(n) be a poly-
nomial of degree d2 where d1 > d2 > 0. Then, p(n) = ω(q(n)).
E.g., 2n = o(3n ).
E.g., 15 = Θ(0.5).
Consequence a consequence of the above claims, and the fact that Θ(·)
is an equivalence relation, is that we can choose a succint representative of
several functions.
Insertion sort
Consider the following version of insertion sort, from Cormen, et al., “Intro-
duction to Algorithms.”
InsertionSortclrs (A[1, . . . , n])
1 foreach j from 2 to n do
2 key ← A[j]
3 i←j−1
4 while i > 0 and A[i] > key do
5 A[i + 1] ← A[i]
6 i←i−1
7 A[i + 1] ← key
Proof. We first observe that if n = 1, i.e., if the input array A has one entry
only, the algorithm does nothing. This is correct because a single-entry array
is already in sorted order. We now consider an input array of at least two
entries.
We prove the following invariant that the foreach loop of Line (1) maintains:
It is important that we prove both (I) and (II) above. If we omit either, we
would not have a meaningful notion of correctness.
To prove the step, we first observe that by the induction assumption, imme-
diately after we execute the foreach loop with j = n − 1, A[1, . . . , n − 1]
satisfies properties (I) and (II) above. For the loop iteration with j = n, Line
(2) is key ← A[n], and, Line (3) is i ← n − 1.
For the step, consider some k > 0. Thus, we enter the while loop at least
once. Consider the very first iteration of the while loop. It must be the case
that A[n − 1] > key, i.e., A[n − 1] > A[n]. Then, in Line (5), we copy A[n − 1]
to A[n] and decrement i by 1 in Line (6). Thus, at the end of this iteration
of the while loop, properties (i)–(iii) are true for i = n − 2. We then exploit
131
The final part of the proof regards Line (7). We observe that all Line (7)
des is instantiate A[i + 1] to what was A[n] on input. As properties (i)—(iii)
above are true, executing Line (7) leaves A[1, . . . , n] satisfying properties (I)
and (II) above.
j−1
n X n
X X
1= (j − 1)
j=2 i=1 j=2
Xn n
X
= j− 1
j=2 j=2
n(n + 1)
= − 1 − (n − 1)
2
n2 n
= −
2 2
ing an input array of n entries for which we are guaranteed that Line (5) is
guaranteed to run j − 1 times, and no fewer, for each value of j.
We observe that this is exactly the case if A[1, . . . , n] comprises distinct
entries that are sorted in reverse. Thus, T (n) = Ω(n2 ).
What about in the best-case? We observe that there exists an input for which
the condition “A[i] > key” never evaluates to true. And this is when A[·],
on input, is already sorted. Therefore, to characterize the best-case, count-
ing the number of executions of Line (5) is no longer meaningful. Rather,
counting the number of executions of Line (2), we claim, is meaningful. As
the foreach loop runs unconditionally, Line (2) is executed Θ(n) times, and
therefore Θ(n) is a meaningful characterization of the best-case running-time
of InsertionSortclrs .
How about in the average-, or expected-case? As before, we make some
assumptions. Assume that we restrict ourselves to arrays of distinct entries
only. And we assume that given n distinct items, every permutation of
them is equally likely. Then, if such a permutation is stored in the array
A[1, . . . , n], given two distinct indices i, j, it is equally likely that A[i] < A[j],
and A[i] > A[j].
To intuit the number of times Line (5) is executed, adopt a set of indicator
random variables Xj,i , for all j = 2, . . . , n, i = 1, . . . , j − 1.
1 if A[i] > A[j]
Xj,i =
0 otherwise
For example, if the input array A = h12, −3, 15, 7, 8i, and j = 4, then X4,1 =
1, X4,2 = 0, X4,3 = 1. Now define another set of random variables, mj , for
each j = 2, . . . , n, which is the number of executions of Line (5) for that
value of j in the outermost loop. And let m be the random variable that
is the total number of executions of Line (5) for this run of the algorithm.
Then:
134
j−1
X
mj = Xj,i
i=1
n j−1
n X
X X
m= mj = Xj,i
j=2 j=2 i=1
j−1 j−1
" n X
# n X
X X
E[m] = E Xj,i = E[Xj,i ]
j=2 i=1 j=2 i=1
j−1
n X j−1
n X
X X 1
= Pr{Xj,i = 1} =
j=2 i=1 j=2 i=1
2
n(n − 1)
= = Θ(n2 )
4
Θ(1) if n = 1
T (n) =
T (n − 1) + Θ(n) otherwise
Where “Θ(1)” stands for “some f (n) where f (n) = Θ(1)” and similarly for
Θ(n). We can solve this recurrence to get a closed-form solution. One way
to do this is to first adopt concrete functions for Θ(n), and then inductively
do a kind of string replacement.
135
Suppose we adopt the function n in place of Θ(n), and 1 for Θ(1). Then:
T (n) −→ T (n − 1) + n
−→ T (n − 2) + n − 1 + n
−→ T (n − 3) + n − 2 + n − 1 + n
..
.
−→ T (n − (n − 1)) + 2 + 3 + . . . + (n − 2) + (n − 1) + n
−→ 1 + 2 + 3 + . . . + (n − 2) + (n − 1) + n
= Θ(n2 )
1 1 0 1 0
0 0 0 0 0
0 0 0 0 0
1 1 0 1 0
1 1 1 0 1 0 1 0
As a sanity check, (11010)2 = (26)10 , (1001)2 = (9)10 and 26 × 9 = (234)10 =
137
Note that Line (5) is bit-shift to the left, and Line (6) is a bit-shift to the
right, i.e., each runs in time Θ(1). The while loop runs for as many bits as
there are in y, ignoring leading 0’s, i.e., O(lg y) times. And Line (4) runs in
time O(lg(xy)). Thus, the total running-time is O(lg y × (lg x + lg y)). So
if we represent the size our input as n, the algorithm runs in time O(n2 ).
Therefore, this is a polynomial-time algorithm. More specifically, at worst a
quadratic-time algorithm.
We can think of the above algorithm as multiplication by “repeated dou-
bling.” More specifically, it realizes the following recurrence.
2xby/2c if y is even
xy =
x + 2xby/2c otherwise
( 2
b abb/2c if b is even
a = 2
a · abb/2c otherwise
We emphasize multiset; i.e., the same number may occur more than once.
We want to compare the following two ways of storing the multiset as an
array.
One is to store each occurrence of a number in each slot of the array. The
other is to store a pair hvalue, number of occurrencesi for each distinct num-
ber. As an example, suppose our multiset is {2, 2, 1, 3, 2, 4, 2, 4}. Then Ap-
proach (A) is to store this as the array: h2, 2, 1, 3, 2, 4, 2, 4i. Approach (B)
would be to store, instead: hh2, 4i, h1, 1i, h3, 1i, h4, 2ii. That is, the value 2
occurs 4 times, the value 1 occurs once, and so on.
Is one approach necessarily better than the other?
Here is one way to answer this question. Consider the worst-case for (B)
when compared to (A), and ask how bad that case would have been if we
had adopted (A) instead, and vice versa. Assume that the size of the multiset
is n, and each number is encoded with the same size, m, bits.
The worst-case for (B) when compared to (A) is when every item is distinct,
i.e., the multiset is a set. Then, the size with Approach (A) is nm, and with
Approach (B) is nm + n. The reason is that with each distinct value, we
store the value 1, which costs us 1 bit.
The worst-case for (A) when compared to (B) is when every item is the same.
Then, under approach (A), we again consume space of nm. Under approach
(B), we consume space m + lg n. The reason is that we store one pair, hx, yi,
where x is the single value of size m, and y is the number of times it occurs,
which is n, which can be encoded with size lg n.
Thus, the worst-case for (B) when compared to (A) is not so bad: if under
Approach (A) we consume space s, under Approach (B) for that same situa-
tion, we consume space O(s), because s = nm, and nm+n ≤ 2nm = O(nm).
However, the worst-case for (A) when compared to (B) is bad. If under
Approach (B) we consume space s, under Approach (A), in the worst-case,
we consume space Θ(2s ). This happens when m is constant in n, and in this
case, (A) consumes space n, and (B) consumes space lg n only.
Thus, Approach (B) can be said to be superior to Approach (A).
139
Θ(·), O(·) and Ω(·), and worst-, best- and average-case Another im-
portant point regards the somewhat subtle relationship between the notions
of Θ(·), O(·) and Ω(·), and the notions of “worst-case,” “best-case” and
“average-case” efficiency. The two are different, but related, and it would
certainly be meaningful to use both in the same sentence. E.g., “The time-
efficiency of my algorithm is Θ(n log n) in the worst-case.” Or, “The space-
efficiency of my algorithm is O(n) in the best-case.” We now discuss this
further, to further clarify the relationship between those two sets of notions.
Suppose I owe you $50. Now someone asks me whether $30 is a lower-bound
on the amount I owe you. Then the answer is ‘yes.’ This is because the
statement, “the amount I owe is lower-bounded by $30” is exactly the same
as “the amount I owe is no less than $30.” Similarly, the statement, “the
amount I owe is upper-bounded by $70” is true. It is the same as, “the
amount I owe is no more than $70.”
The notions of worst- and best-case are such qualifications. Suppose the
more I owe you, the worse it is for me. Returning to the knowledge that
I owe you between $40 and $50, we can say that in the worst-case, I owe
you $50, and in the best-case, I owe you $40. And these are, respectively,
tight-bounds on the worst- and best-case.
The above examples are similar to what we consider in the context of al-
140
LinSearch(A[1, . . . , n], i)
1 ret ← false
2 foreach j from 1 to n do
3 if A[j] = i then
4 ret ← true
5 return ret
What about in the best-case? The best-case, under the measure we have
adopted is elicited by an input in which i 6∈ A[1, . . . , n], and Line (4) is never
executed. Thus, our running-time is 2n + 1, which is different from 2n + 2,
but still Θ(n). Thus, in the best-case, the running-time is both O(n) and
Ω(n).
Because the running-time in the best-case is Ω(n), we can say, without qual-
ification, that the running-time of the algorithm is Ω(n). That is, a lower-
141
Thus, for LinSearch, under the measure for running-time we have adopted, we
can drop all qualifications, and simply say that the algorithm’s running-time
is Θ(n).
In the best-case, we have one execution of Line (4) only, and 1 = Θ(1).
Indeed, even if the number of instances of i is some constant in n, the best-
case has the same asymptotic tight-bound of Θ(1).
As the asymptotic tight-bounds on the running-time are not the same across
cases, specifically, the worst- and best-cases, we no longer can specify a tight-
bound on the running-time of the algorithm without qualification. However,
we can say that the algorithm runs in time Ω(1) and O(n); we can say that
without the qualifications of best- and worst-time. If we are willing to add
the qualification of worst- or best-case, then we indeed have tight bounds.
The algorithm runs in time Θ(n) in the worst-case, and Θ(1) in the best-case.
Inputs:
The termination property is: for every legal input, hA, lo, hi , ii, BinSearch
is guaranteed to halt. By “legal” input, we mean one that meets the con-
straints we impose on the invoker of BinSearch. That is, A must be a finite
143
hi 2 − lo 2 = hi 1 − m − 1
lo 1 + hi 1
= hi 1 − −1
2
lo 1 + hi 1
< hi 1 − −1 −1 ∵ ∀x ∈ R, x − bxc < 1
2
hi 1 − lo 1
=
2
≤ hi 1 − lo 1
144
hi 2 − lo 2 = m − 1 − lo 1
lo 1 + hi 1
= − 1 − lo 1
2
lo 1 + hi 1
< + 1 − 1 − lo 1
2
hi 1 − lo 1
=
2
≤ hi 1 − lo 1
We say that BinSearch is correct if, for any legal input hA, lo, hi , ii,
both the following are true:
Claim 35. Suppose we successfully enter the while loop with some values for
lo and hi. Then, in that iteration of the loop, lo ≤ mid ≤ hi.
lo+hi
Proof. From Line (2) of the algorithm, mid = 2
. We carry out the
following case-analysis.
Case 1: lo + hi is even. In this case:
lo + hi
mid =
2
lo + lo
=⇒ mid ≥ ∵ lo ≤ hi
2
=⇒ mid ≥ lo
Similarly, we have:
lo + hi
mid =
2
hi + hi
=⇒ mid ≤ ∵ lo ≤ hi
2
=⇒ mid ≤ hi
Similarly, we have:
lo + hi − 1
mid =
2
hi + hi − 1
=⇒ mid ≤ ∵ lo ≤ hi
2
=⇒ mid ≤ hi − 1/2 ≤ hi ∵ hi − 1/2 ≤ hi
146
Claim 36. Suppose the values of lo and hi on input are designated lo (in) and
hi (in) , respectively. Then, the following are all true at any point in the run
of the algorithm.
1. hi ≤ hi (in) .
2. lo ≥ lo (in) .
• We return in Line (3). Then, hi (out) = hi (in) , lo (out) = lo (in) , and all the
assertions (1)–(4) are true.
• We update lo in Line (4). Then, hi (out) = hi (in) , lo (out) > lo (in) , because,
from Claim 35, mid ≥ lo (in) . Thus, the assertions (1) and (2) are true.
As this is the only iteration that happens, we know that lo (out) > hi (out) .
But hi (out) = hi (in) , and therefore (3) is true. And (4) is true because
the premise, hi (out) < lo (in) is false.
• We update hi in Line (5). Then, hi (out) < hi (in) , lo (out) = lo (in) , because,
from Claim 35, mid ≤ hi (in) . We follow a similar reasoning as the above
case.
147
Proof. Any time immediately after we have computed a value for mid , and
before subsequent updates to lo or hi , we know the following:
lo (in) ≤ lo from Claim 36, (2)
lo ≤ mid from Claim 35
mid ≤ hi from Claim 35
hi ≤ hi (in) from Claim 36, (1)
Proof. The only way an invocation of BinSearch returns true is in Line (3).
And this happens only if A[mid ] = i. And by Claim 37, we are guaranteed
that lo (in) ≤ mid ≤ hi (in) .
148
Before we assert and prove the other case of correctness, we establish another
property of the algorithm.
Claim 39. Suppose, at the time we successfully enter the while loop in some
run of the algorithm, the values of lo and hi are designated lo (in) and hi (in) ,
respectively. Suppose we do not return true in Line (3) for this iteration, and
when we are done with the iteration, the values of lo and hi are designated
lo (out) and hi (out) , respectively. Then, at least one of the following is true: (i)
lo (in) < lo (out) and hi (in) = hi (out) , or, (ii) hi (in) > hi (out) and lo (in) = lo (out) .
Proof. As we do not return true in Line (3), we know that we either (i)
update lo and leave hi unchanged in Line (4), or, (ii) update hi and leave lo
unchanged in Line (5). Suppose (i) happens. Then, lo (out) = mid + 1. But
from Claim 37, mid ≥ lo (in) . Therefore, mid + 1 > lo (in) , and lo (out) > lo (in) .
And as we leave hi unchanged, hi (in) = hi (out) .
If (ii) happens, then hi (out) = mid − 1. But from Claim 37, mid ≤ hi (in) .
Therefore, mid − 1 < hi (in) , and hi (out) < hi (in) . And as we leave lo un-
changed, lo (in) = lo (out) .
D E
(in) (in)
Claim 40. If BinSearch returns false on some input A, lo , hi , i ,
then i is not in A[lo (in) , . . . , hi (in) ].
• Case (a): we know that lo (out) > hi (out) . We consider the following two
sub-cases, which, according to Claim 39, are exhaustive.
– Sub-case (i): lo (out) = lo (in) , hi (out) < hi (in) . Thus, we have hi (out) =
mid − 1 < lo (in) =⇒ mid ≤ lo (in) =⇒ mid = lo (in) , because by
Claims 36 and 37, lo (in) ≤ lo ≤ mid ≤ hi ≤ hi (in) , at any point of
a run of the algorithm that we are within the while loop.
Thus, because we did not return true in this iteration, and we
adjusted the value of hi , we know that i < A[lo (in) ]. Thus, i is not
in A[lo (in) , . . . , hi (in) ] because A is sorted in non-decreasing order.
– Sub-case (ii): lo (out) > lo (in) , hi (out) = hi (in) . Thus, we have
lo (out) = mid + 1 > hi (in) =⇒ mid ≥ hi (in) =⇒ mid = hi (in) ,
because by Claims 36 and 37, lo (in) ≤ lo ≤ mid ≤ hi ≤ hi (in) , at
any point of a run of the algorithm that we are within the while
loop.
Thus, because we did not return true in this iteration, and we
adjusted the value of lo, we know that i > A[hi (in) ]. Thus, i is not
in A[lo (in) , . . . , hi (in) ] because A is sorted in non-decreasing order.
• Case (b): we know that lo (out) ≤ hi (out) . We consider the following two
sub-cases, which, according to Claim 39, are exhaustive.
– Sub-case (i): lo (out) = lo (in) , hi (out) < hi (in) . Thus, we have hi (out) −
lo (out) < hi (in) − lo (in) , and all future runs of the algorithm are
equivalent to an invocation BinSearch(A, lo (out) , hi (out) , i), which
falls under the induction assumption.
– Sub-case (ii): lo (out) > lo (in) , hi (out) = hi (in) . Thus, we have
hi (out) − lo (out) < hi (in) − lo (in) , and all future runs of the algorithm
are equivalent to an invocation BinSearch(A, lo (out) , hi (out) , i),
which falls under the induction assumption.
150
Claim 41. Suppose we have at least two consecutive iterations of the while
loop, and hi − lo + 1 at entry in to the first iteration is k1 , and hi − lo + 1
at entry in to the second iteration is k2 . Then, k2 ≤ k21 .
Proof. Let the lo, hi values at entry in to the first iteration be lo 1 , hi 1 re-
spectively, and lo 2 , hi 2 at entry in to the second iteration. We consider the
following four cases, which are exhaustive.
lo 1 + hi 1
lo 2 = +1
2
lo 1 + hi 1
=⇒ k2 = hi 2 − lo 2 + 1 = hi 1 − + 1 + 1 ∵ hi 2 = hi 1
2
(hi 1 − lo 1 )
=
2
k1 − 1 k1
= ≤
2 2
151
1 −(lo
The last inference is true because lo 1 +hi 1 is odd =⇒ 2×hi 1 +hi 1 )
is odd =⇒ hi 1 − lo 1 + 1 = k1 is even. Therefore, k21 = k21 .
Claim 42. Let t(k) be the number of loop iterations that happen in the worst-
case in an invocation to BinSearch with some input hA, lo, hi , ii, where
k = hi − lo + 1. Then:
0 if k ≤ 0
t(k) ≤
t(bk/2c) + 1 otherwise
Claim 43. If t(k) is characterized as the recurrence in Claim 42, then t(k) =
O(lg k). More specifically, for all integers k ≥ 1, t(k) ≤ 1 + log2 k.
The other case is that k is even =⇒ b(k + 1)/2c = k/2. We assume that
k > 0, because otherwise, k + 1 = 1, which is the base case above. From the
recurrence and induction assumption, t(k +1) ≤ t(k/2)+1 ≤ 2+log2 (k/2) =
1 + log2 (k) ≤ 1 + log2 (k + 1), as desired.