CS161Lecture02
CS161Lecture02
1 Introduction
Today, we will introduce a fundamental algorithm design paradigm, Divide-And-Conquer, through a case
study of the MergeSort algorithm. Along the way, we’ll introduce guding principles for algorithm design,
including Worst-Case and Asymptotic Analysis, which we will use throughout the remainder of the course.
We will introduce asymptotic notation (“Big-Oh”) for analyzing the run times of algorithms.
2.1 InsertionSort
There are a number of natural algorithms for sorting a list of numbers. One such algorithm is Insertion-
Sort. InsertionSort iterates through the elements of the given array and after each iteration produces
a sorted array of the elements considered so far. At iteration i, the algorithm inserts the ith element into
the right position in the sorted array of the first i − 1 elements. The pseudocode1 of InsertionSort from
Section 2.1 of CLRS appears as Algorithm 1.
Algorithm 1: InsertionSort(A)
for i = 2 → length(A) do
key ← A[i];
j ← i − 1;
while j > 0 and A[j] > key do
A[j + 1] ← A[j];
j ← j − 1;
A[j + 1] ← key;
Pn
At iteration i, the algorithm may be required to move i elements, so the total runtime is roughly i=1 i =
n(n+1)
2 = O(n2 ). The natural question to ask (which should become your mantra this quarter) is “Can we
do better?”. In fact, the answer to this question is “Yes”. The point of today’s lecture is to see how we
might do better, and to begin to understand what “better” even means.
1 A note on pseudocode: We will write our algorithms in pseudocode. The point of pseudocode is to be broadly descriptive
– to convey your approach in solving a problem without getting hung up on the specifics of a programming language. In fact,
one of the key benefits of using pseudocode to describe algorithms is that you can take the algorithm and implement it in any
language you want based on your needs.
1
2.2 Divide-And-Conquer
The Divide-And-Conquer paradigm is a broad pattern for designing algorithms to many problems. Unsur-
prisingly, the pattern uses the following strategy to solve problems.
• Break the problem into subproblems
• Solve the subproblems (recursively)
• Combine results of the subproblems
This strategy requires that once the instances become small enough, the problem is trivial to solve (or it is
cheap to find a solution through brute-force).
With this pattern in mind, there is a very natural way to formulate a Divide-And-Conquer algorithm for
the sorting problem. Consider the following pseudocode for MergeSort (in Algorithm 2). A[i : j] denotes
the subarray of A from index i to j (including both A[i] and A[j]).
Algorithm 2: MergeSort(A)
n ← length(A);
if n ≤ 1 then
return A;
L ← MergeSort(A[1 : n/2]);
R ← MergeSort(A[n/2 + 1 : n]);
return Merge(L, R);
Now, we need to describe the Merge procedure, which takes two sorted arrays, L and R, and produces
a sorted array containing the elements of L and R. Consider the following Merge procedure (Algorithm 3),
which we will call as a subroutine in MergeSort.
Algorithm 3: Merge(L, R)
m ← length(L) + length(R);
S ← empty array of size m;
i ← 1; j ← 1;
for k = 1 → m do
if L(i) < R(j) then
S(k) ← L(i);
i ← i + 1;
else
S(k) ← R(j);
j ← j + 1;
return S;
Intuitively, Merge loops through every position in the final array, and at the ith call, looks for the ith
smallest element. (As a special case, consider the fact that the smallest element in S must be the smallest
element in L or the smallest element in R). Because L and R are sorted, we can find the ith smallest element
quickly, by keeping track of which elements we’ve processed so far. (The Merge subroutine, as written, is
actually incomplete. You should think about how you would have to edit the pseudocode to handle what
happens when we get to the end of L or R.2 )
Now that we have an algorithm, the first question we always want to ask is: “Is the algorithm correct?”
In this case, to answer this question, we will define an invariant which we will claim holds at every recursive
2 Furthermore, sorting functions usually change the order of elements in the given array instead of returning a sorted copy
of it. For a more complete implementation of MergeSort, see Section 2.3.1 in CLRS.
2
Figure 2.4 from CLRS
Schematic of the levels of recursive calls, or “recursion tree”, and the resulting calls to Merge
call. The invariant will be the following: “In every recursive call, MergeSort returns a sorted array.” If we
can prove that this invariant holds, it will immediately prove that MergeSort is correct, as the first call
to MergeSort will return a sorted array. Here, we will prove that the invariant holds.
Proof of Invariant. By induction. Consider the base case, of the algorithm, when MergeSort receives
≤ 1 element. Any array of ≤ 1 element is sorted (trivially), so in this case, by returning A, MergeSort
returns a sorted list.
To see the inductive step, suppose that after n − 1 ≥ 1 levels of recursive calls return, MergeSort still
returns a sorted list. We will argue that the nth recursive call returns a sorted list. Consider some execution
of MergeSort where L and R were just returned after ≤ n − 1 recursive calls. Then, by the inductive
hypothesis, L and R are both sorted. We argue that the result of Merge(L, R) will be a sorted list. To see
this, note that in the Merge subroutine, the minimum remaining element must be at either the ith position
in L or the jth position of R. When we find the minimum element, we will increment i or j accordingly,
which will result in the next remaining minimal element still being in the ith position in L or the jth position
of R. Thus, Merge will construct a sorted list, and our induction holds.
3
This is a precise, but somewhat unruly expression for the running time. In particular, it seems difficult to
keep track of lots of different constants, and it isn’t clear which costs will be more or less expensive (especially
if we switch programming languages or machine architectures). To simplify our analysis, we choose to assume
that there is some global constant cop which represents the cost of an operation. You may think of cop as
max{Costa , Costc , Costi , . . .}. We can then bound the amount of running time for Merge as
Using the fact that we know that m ≥ 1, we can upper bound this running time by 6m operations.
Now that we have a bound on the number of operations required in a Merge of m numbers, we want
to translate this into a bound on the number of operations required for MergeSort. At first glance, the
pessimist in you may be concerned that at each level of recursive calls, we’re spawning an exponentially
increasing number of copies of MergeSort (because the number of calls at each depth doubles). Dual
to this, the optimist in you will notice that at each level, the inputs to the problems are decreasing at an
exponential rate (because the input size halves with each recursive call). Today, the optimists win out.
Claim 1. MergeSort requires at most 6n log n + 6n operations to sort n numbers.3
Before we go about proving this bound, let’s first consider whether this running time bound is good. We
mentioned earlier that more obvious methods of sorting, like InsertionSort, required roughly n2 operations.
How does n2 = n · n compare to n · log n? An intuitive definition of log n is the following: “Enter n into your
calculator. Divide by 2 until the total is ≤ 1. The number of times you divided is the logarithm of n.” This
number in general will be significantly smaller than n. In particular, if n = 32, then log n = 5; if n = 1024,
then log n = 10. Already, to sort arrays of ≈ 103 numbers, the savings of n log n as compared to n2 will
be orders of magnitude. At larger problem instances of 106 , 109 , etc. the difference will become even more
pronounced! n log n is much closer to growing linearly (with n) than it is to growing quadratically (with n2 ).
One way to argue about the running time of recursive algorithms is to use recurrence relations. A
recurrence relation for a running time expresses the time it takes to solve an input of size n in terms of the
time required to solve the recursive calls the algorithm makes. In particular, we can write the running time
T (n) for MergeSort on an array of n numbers as the following expression.
There are a number of sophisticated and powerful techniques for solving recurrences. We will cover many of
these techniques in the coming lectures. Today, we can actually analyze the running time directly.
Proof of Claim 1. Consider the recursion tree of a call to MergeSort on an array of n numbers. Assume
for simplicity that n is a power of 2. Let’s refer to the initial call as Level 0, the proceeding recursive calls
as Level 1, and so on, numbering the level of recursion by its depth in the tree. How deep is the tree? At
each level, the size of the inputs is divided in half, and there are no recursive calls when the input size is ≤ 1
element. By our earlier “definition”, this means the bottom level will be Level log n. Thus, there will be a
total of log n + 1 levels.
We can now ask two questions: (1) How many subproblems are there at Level i? (2) How large are the
individual subproblems at Level i? We can observe that at the ith level, there will be 2i subproblems, each
with inputs of size n/2i . So how much work do we do overall at the ith level? First, we need to make two
recursive calls – but the work done for these recursive calls will be accounted for by Level i + 1. Thus, we
are really concerned about the cost of a call to Merge at the Level i. We can express the work per level as
follows:
3 In this lecture and all future lectures, unless explicitly state otherwise, log refers to log2 .
4
Figure 2.5 from CLRS – Analyzing the running time in terms of a Recursion Tree
Importantly, we can see that the work done at Level i is independent of i – it only depends on n and is the
same for every level. This means we can bound the total running time as follows:
5
3.1 Worst-Case Analysis
One guiding principle we’ll use throughout the class is that of Worst-Case Analysis. In particular, this
means that we want any statement we make about our algorithms to hold for every possible input. Stated
differently, we can think about playing a game against an adversary, who wants to maximize our running
time (make it as bad as possible). We get to specify an algorithm and state a running time T (n); the
adversary then chooses an input. We win the game if even in the worst case, whatever input the adversary
chooses (of size n), our algorithm runs in at most T (n) time.
Note that because our algorithm made no assumptions about the input, then our running time bound
will hold for every possible input. This is a very strong, robust guarantee. 4
“Big-Oh” Notation:
Intuitively, Big-Oh notation gives an upper bound on a function. We say T (n) is O(f (n)) when as n gets
big, f (n) grows at least as quickly as T (n). Formally, we say
algorithm that works well in expectation on these inputs (this is frequently referred to as Average-Case Analysis). This type of
analysis is less common and can lead to difficulties. For example, understanding which inputs are more likely to appear can be
very hard, or the algorithms can be too tailored to fit our assumptions on the input.
6
“Big-Omega” Notation:
Intuitively, Big-Omega notation gives a lower bound on a function. We say T (n) is Ω(f (n)) when as n gets
big, f (n) grows at least as slowly as T (n). Formally, we say
“Big-Theta” Notation:
T (n) is Θ(f (n)) if and only if T (n) = O(f (n)) and T (n) = Ω(f (n)). Equivalently, we can say that
We can see that these notations really do capture exactly the behavior that we want – namely, to focus
on the rate of growth of a function as the inputs get large, ignoring constant factors and lower order terms.
As a sanity check, consider the following example and non-example.
Claim 2. All degree-k polynomials are O(nk ).
Proof of Claim. Suppose T (n) is a degree-k polynomial. That is, T (n) = ak nk + . . . + a1 n + a0 for some
choice of ai ’s where ak 6= 0. To show that T (n) is O(nk ) we must find a c and n0 such that for all n ≥ n0
T (n) ≤ c · nk . (Since T (n) represents the running time of an algorithm, we assume it is positive.) Let n0 = 1
and let a∗ = maxi |ai |. We can bound T (n) as follows:
T (n) = ak nk + . . . + a1 n + a0
≤ a∗ nk + . . . + a∗ n + a∗
≤ a∗ nk + . . . + a∗ nk + a∗ nk
= (k + 1)a∗ · nk
Let c = (k + 1)a∗ which is a constant, independent of n. Thus, we’ve exhibited c, n0 which satisfy the Big-Oh
definition, so T (n) = O(nk ).
Claim 3. For any k ≥ 1, nk is not O(nk−1 ).
Proof of Claim. By contradiction. Assume nk = O(nk−1 ). Then there is some choice of c and n0 such that
nk ≤ c · nk−1 for all n > n0 . But this in turn means that n ≤ c for all n ≥ n0 , which contradicts the fact
that c is a constant, independent of n. Thus, our original assumption was false and nk is not O(nk−1 ).