0% found this document useful (0 votes)
6 views7 pages

CS161Lecture02

This document discusses the MergeSort algorithm as a case study for the Divide-And-Conquer paradigm in algorithm design. It covers the algorithm's correctness, running time analysis, and introduces concepts such as Worst-Case and Asymptotic Analysis. The lecture emphasizes the efficiency of MergeSort compared to simpler algorithms like InsertionSort, highlighting its O(n log n) running time versus O(n^2).

Uploaded by

muhindh2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
6 views7 pages

CS161Lecture02

This document discusses the MergeSort algorithm as a case study for the Divide-And-Conquer paradigm in algorithm design. It covers the algorithm's correctness, running time analysis, and introduces concepts such as Worst-Case and Asymptotic Analysis. The lecture emphasizes the efficiency of MergeSort compared to simpler algorithms like InsertionSort, highlighting its O(n log n) running time versus O(n^2).

Uploaded by

muhindh2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 7

CS161, Lecture 2 MergeSort, Recurrences, Asymptotic Analysis

Scribe: Michael P. Kim Date: September 28, 2016


Edited by Ofir Geri

1 Introduction
Today, we will introduce a fundamental algorithm design paradigm, Divide-And-Conquer, through a case
study of the MergeSort algorithm. Along the way, we’ll introduce guding principles for algorithm design,
including Worst-Case and Asymptotic Analysis, which we will use throughout the remainder of the course.
We will introduce asymptotic notation (“Big-Oh”) for analyzing the run times of algorithms.

2 MergeSort and the Divide-And-Conquer Paradigm


The sorting problem is a canonical computer science problem. The problem is specified as follows: as input,
you receive an array of n numbers, possibly unsorted; the goal is to output the same numbers, sorted in
increasing order. Computer scientists care a lot about sorting because many other algorithms will use sorting
as a subroutine. Thus, it is extremely important to find efficient algorithms for sorting lists, that work well
in theory and in practice.
In this lecture, we’ll assume that the elements given in the array are distinct. It is a worthwhile exercise
to go through the notes carefully and see which aspects of our analysis would need to change if we allow for
ties amongst elements.

2.1 InsertionSort
There are a number of natural algorithms for sorting a list of numbers. One such algorithm is Insertion-
Sort. InsertionSort iterates through the elements of the given array and after each iteration produces
a sorted array of the elements considered so far. At iteration i, the algorithm inserts the ith element into
the right position in the sorted array of the first i − 1 elements. The pseudocode1 of InsertionSort from
Section 2.1 of CLRS appears as Algorithm 1.

Algorithm 1: InsertionSort(A)
for i = 2 → length(A) do
key ← A[i];
j ← i − 1;
while j > 0 and A[j] > key do
A[j + 1] ← A[j];
j ← j − 1;
A[j + 1] ← key;

Pn
At iteration i, the algorithm may be required to move i elements, so the total runtime is roughly i=1 i =
n(n+1)
2 = O(n2 ). The natural question to ask (which should become your mantra this quarter) is “Can we
do better?”. In fact, the answer to this question is “Yes”. The point of today’s lecture is to see how we
might do better, and to begin to understand what “better” even means.
1 A note on pseudocode: We will write our algorithms in pseudocode. The point of pseudocode is to be broadly descriptive

– to convey your approach in solving a problem without getting hung up on the specifics of a programming language. In fact,
one of the key benefits of using pseudocode to describe algorithms is that you can take the algorithm and implement it in any
language you want based on your needs.

1
2.2 Divide-And-Conquer
The Divide-And-Conquer paradigm is a broad pattern for designing algorithms to many problems. Unsur-
prisingly, the pattern uses the following strategy to solve problems.
• Break the problem into subproblems
• Solve the subproblems (recursively)
• Combine results of the subproblems
This strategy requires that once the instances become small enough, the problem is trivial to solve (or it is
cheap to find a solution through brute-force).
With this pattern in mind, there is a very natural way to formulate a Divide-And-Conquer algorithm for
the sorting problem. Consider the following pseudocode for MergeSort (in Algorithm 2). A[i : j] denotes
the subarray of A from index i to j (including both A[i] and A[j]).

Algorithm 2: MergeSort(A)
n ← length(A);
if n ≤ 1 then
return A;
L ← MergeSort(A[1 : n/2]);
R ← MergeSort(A[n/2 + 1 : n]);
return Merge(L, R);

Now, we need to describe the Merge procedure, which takes two sorted arrays, L and R, and produces
a sorted array containing the elements of L and R. Consider the following Merge procedure (Algorithm 3),
which we will call as a subroutine in MergeSort.

Algorithm 3: Merge(L, R)
m ← length(L) + length(R);
S ← empty array of size m;
i ← 1; j ← 1;
for k = 1 → m do
if L(i) < R(j) then
S(k) ← L(i);
i ← i + 1;
else
S(k) ← R(j);
j ← j + 1;

return S;

Intuitively, Merge loops through every position in the final array, and at the ith call, looks for the ith
smallest element. (As a special case, consider the fact that the smallest element in S must be the smallest
element in L or the smallest element in R). Because L and R are sorted, we can find the ith smallest element
quickly, by keeping track of which elements we’ve processed so far. (The Merge subroutine, as written, is
actually incomplete. You should think about how you would have to edit the pseudocode to handle what
happens when we get to the end of L or R.2 )
Now that we have an algorithm, the first question we always want to ask is: “Is the algorithm correct?”
In this case, to answer this question, we will define an invariant which we will claim holds at every recursive
2 Furthermore, sorting functions usually change the order of elements in the given array instead of returning a sorted copy

of it. For a more complete implementation of MergeSort, see Section 2.3.1 in CLRS.

2
Figure 2.4 from CLRS
Schematic of the levels of recursive calls, or “recursion tree”, and the resulting calls to Merge

call. The invariant will be the following: “In every recursive call, MergeSort returns a sorted array.” If we
can prove that this invariant holds, it will immediately prove that MergeSort is correct, as the first call
to MergeSort will return a sorted array. Here, we will prove that the invariant holds.
Proof of Invariant. By induction. Consider the base case, of the algorithm, when MergeSort receives
≤ 1 element. Any array of ≤ 1 element is sorted (trivially), so in this case, by returning A, MergeSort
returns a sorted list.
To see the inductive step, suppose that after n − 1 ≥ 1 levels of recursive calls return, MergeSort still
returns a sorted list. We will argue that the nth recursive call returns a sorted list. Consider some execution
of MergeSort where L and R were just returned after ≤ n − 1 recursive calls. Then, by the inductive
hypothesis, L and R are both sorted. We argue that the result of Merge(L, R) will be a sorted list. To see
this, note that in the Merge subroutine, the minimum remaining element must be at either the ith position
in L or the jth position of R. When we find the minimum element, we will increment i or j accordingly,
which will result in the next remaining minimal element still being in the ith position in L or the jth position
of R. Thus, Merge will construct a sorted list, and our induction holds. 

2.3 Running Time Analysis


After answering the question of correctness, the next question to ask is: “Is the algorithm good ?” Well, that
depends on how we define the “goodness” of an algorithm. In this class, as is typical, we will generally
reframe this question as: “How does the running time of the algorithm grow with the size of the input?”
Generally, the slower the running time grows as the input increases in size, the better the algorithm.
Our eventual goal is to argue about the running time of MergeSort, but this seems a bit challenging.
In particular, each call to MergeSort makes a number of recursive calls and then calls Merge – it isn’t
immediately obvious how we should bound the time that MergeSort takes. A seemingly less ambitious
goal would be to analyze how long a call to Merge takes. We will start here, and see if this gives us
something to grasp onto when arguing about the total running time.
Consider a single call to Merge, where we’ll assume the total size of S is m numbers. How long will it
take for Merge to execute? To start, there are two initializations for i and j. Then, we enter a for loop
which will execute m times. Each loop will require one comparison, followed by an assignment to S and
an increment of i or j. Finally, we’ll need to increment the counter in the for loop k. If we assume that
each operation costs us a certain amount of time, say Costa for assignment, Costc for comparison, Costi for
incrementing a counter, then we can express the total time of the Merge subroutine as follows:

2Costa + m(Costa + Costc + 2Costi )

3
This is a precise, but somewhat unruly expression for the running time. In particular, it seems difficult to
keep track of lots of different constants, and it isn’t clear which costs will be more or less expensive (especially
if we switch programming languages or machine architectures). To simplify our analysis, we choose to assume
that there is some global constant cop which represents the cost of an operation. You may think of cop as
max{Costa , Costc , Costi , . . .}. We can then bound the amount of running time for Merge as

2cop + 4cop m = 2 + 4m operations

Using the fact that we know that m ≥ 1, we can upper bound this running time by 6m operations.
Now that we have a bound on the number of operations required in a Merge of m numbers, we want
to translate this into a bound on the number of operations required for MergeSort. At first glance, the
pessimist in you may be concerned that at each level of recursive calls, we’re spawning an exponentially
increasing number of copies of MergeSort (because the number of calls at each depth doubles). Dual
to this, the optimist in you will notice that at each level, the inputs to the problems are decreasing at an
exponential rate (because the input size halves with each recursive call). Today, the optimists win out.
Claim 1. MergeSort requires at most 6n log n + 6n operations to sort n numbers.3

Before we go about proving this bound, let’s first consider whether this running time bound is good. We
mentioned earlier that more obvious methods of sorting, like InsertionSort, required roughly n2 operations.
How does n2 = n · n compare to n · log n? An intuitive definition of log n is the following: “Enter n into your
calculator. Divide by 2 until the total is ≤ 1. The number of times you divided is the logarithm of n.” This
number in general will be significantly smaller than n. In particular, if n = 32, then log n = 5; if n = 1024,
then log n = 10. Already, to sort arrays of ≈ 103 numbers, the savings of n log n as compared to n2 will
be orders of magnitude. At larger problem instances of 106 , 109 , etc. the difference will become even more
pronounced! n log n is much closer to growing linearly (with n) than it is to growing quadratically (with n2 ).
One way to argue about the running time of recursive algorithms is to use recurrence relations. A
recurrence relation for a running time expresses the time it takes to solve an input of size n in terms of the
time required to solve the recursive calls the algorithm makes. In particular, we can write the running time
T (n) for MergeSort on an array of n numbers as the following expression.

T (n) = T (n/2) + T (n/2) + T (Merge(n))


≤ 2 · T (n/2) + 6n

There are a number of sophisticated and powerful techniques for solving recurrences. We will cover many of
these techniques in the coming lectures. Today, we can actually analyze the running time directly.
Proof of Claim 1. Consider the recursion tree of a call to MergeSort on an array of n numbers. Assume
for simplicity that n is a power of 2. Let’s refer to the initial call as Level 0, the proceeding recursive calls
as Level 1, and so on, numbering the level of recursion by its depth in the tree. How deep is the tree? At
each level, the size of the inputs is divided in half, and there are no recursive calls when the input size is ≤ 1
element. By our earlier “definition”, this means the bottom level will be Level log n. Thus, there will be a
total of log n + 1 levels.
We can now ask two questions: (1) How many subproblems are there at Level i? (2) How large are the
individual subproblems at Level i? We can observe that at the ith level, there will be 2i subproblems, each
with inputs of size n/2i . So how much work do we do overall at the ith level? First, we need to make two
recursive calls – but the work done for these recursive calls will be accounted for by Level i + 1. Thus, we
are really concerned about the cost of a call to Merge at the Level i. We can express the work per level as
follows:
3 In this lecture and all future lectures, unless explicitly state otherwise, log refers to log2 .

4
Figure 2.5 from CLRS – Analyzing the running time in terms of a Recursion Tree

Work at Level i = (number of subproblems) · (work per subproblem)


n
≤ 2i · 6 i
2
= 6n

Importantly, we can see that the work done at Level i is independent of i – it only depends on n and is the
same for every level. This means we can bound the total running time as follows:

Total RT = (work per level) · (number of levels)


≤ (6n) · (log n + 1)
= 6n log n + 6n

3 Guiding Principles for Algorithm Design and Analysis


After going through the algorithm and analysis, it is natural to wonder if we’ve been too sloppy. In particular,
note that the algorithm never “looks at” the input. For instance, what if we received the sequence of numbers
[1, 2, 3, 5, 4, 6, 7, 8]? Clearly, there is a “sorting algorithm” for this sequence that only takes a few operations,
but MergeSort runs through all log n + 1 levels of recursion anyway. Would it be better to try to design
our algorithms with this in mind? Additionally, in our analysis, we’ve given a very loose upper bound on
the time required of Merge and dropped a number of constant factors and lower order terms. Is this a
problem? In what follows, we’ll argue that these are actually features, not bugs, in the design and analysis
of the algorithm.

5
3.1 Worst-Case Analysis
One guiding principle we’ll use throughout the class is that of Worst-Case Analysis. In particular, this
means that we want any statement we make about our algorithms to hold for every possible input. Stated
differently, we can think about playing a game against an adversary, who wants to maximize our running
time (make it as bad as possible). We get to specify an algorithm and state a running time T (n); the
adversary then chooses an input. We win the game if even in the worst case, whatever input the adversary
chooses (of size n), our algorithm runs in at most T (n) time.
Note that because our algorithm made no assumptions about the input, then our running time bound
will hold for every possible input. This is a very strong, robust guarantee. 4

3.2 Asymptotic Analysis


Throughout our argument about MergeSort, we combined constants (think Costa , Costi , etc.) and gave
very loose upper bounds (i.e. 6m ≥ 4m + 2). Why did we choose to do this? First, it makes the math much
easier. But does it come at the cost of getting the “right” answer? Would we get a more predictive result if
we threw all these exact expressions back into the analysis? From the perspective of an algorithm designer,
the answer is to both of these questions is a resounding “No”. As an algorithm designer, we want to come up
with results that are broadly applicable, whose truth does not depend on features of a specific programming
language or machine architecture. The constants that we’ve dropped will depend greatly on the language and
machine on which you’re working. For the same reason we use pseudocode instead of writing our algorithms
in Java, trying to quantify the exact running time of an algorithm would be inappropriately specific. This is
not to say that constant factors never matter in applications (e.g. I would be rather upset if my web browser
ran 7 times slower than it does now) but worrying about these factors is not the goal of this class. In this
class, our goal will be to argue about which strategies for solving problems are wise and why.
In particular, we will focus on Asymptotic Analysis. This type of analysis focuses on the running time of
your algorithm as your input size gets very large (i.e. n → +∞). This framework is motivated by the fact
that if we need to solve a small problem, it doesn’t cost that much to solve it by brute-force. If we want
to solve a large problem, we may need to be much more creative in order for the problem to run efficiently.
From this perspective, it should be very clear that 6n(log n + 1) is much better than n2 /2. (If you are
unconvinced, try plugging in some values for n.)
Intuitively, we’ll say that an algorithm is “fast” when the running time grows “slowly” with the input
size. In this class, we want to think of growing “slowly” as growing as close to linear as possible. Based on
this this intuitive notion, we can come up with a formal system for analyzing how quickly the running time
of an algorithm grows with its input size.

3.3 Asymptotic Notation


To talk about the running time of algorithms, we will use the following notation. T (n) denotes the runtime
of an algorithm on input of size n.

“Big-Oh” Notation:
Intuitively, Big-Oh notation gives an upper bound on a function. We say T (n) is O(f (n)) when as n gets
big, f (n) grows at least as quickly as T (n). Formally, we say

T (n) = O(f (n)) ⇐⇒ ∃c, n0 > 0 s.t. ∀n ≥ n0 , 0 ≤ T (n) ≤ c · f (n)


4 In the case where you have significant domain knowledge about which inputs are likely, you may choose to design an

algorithm that works well in expectation on these inputs (this is frequently referred to as Average-Case Analysis). This type of
analysis is less common and can lead to difficulties. For example, understanding which inputs are more likely to appear can be
very hard, or the algorithms can be too tailored to fit our assumptions on the input.

6
“Big-Omega” Notation:
Intuitively, Big-Omega notation gives a lower bound on a function. We say T (n) is Ω(f (n)) when as n gets
big, f (n) grows at least as slowly as T (n). Formally, we say

T (n) = Ω(f (n)) ⇐⇒ ∃c, n0 > 0 s.t. ∀n ≥ n0 , 0 ≤ c · f (n) ≤ T (n)

“Big-Theta” Notation:
T (n) is Θ(f (n)) if and only if T (n) = O(f (n)) and T (n) = Ω(f (n)). Equivalently, we can say that

T (n) = Θ(f (n)) ⇐⇒ ∃c1 , c2 , n0 > 0 s.t. ∀n ≥ n0 , 0 ≤ c1 f (n) ≤ T (n) ≤ c2 f (n)

Figure 3.1 from CLRS – Examples of Asymptotic Bounds


(Note: In these examples f (n) corresponds to our T (n) and g(n) corresponds to our f (n).)

We can see that these notations really do capture exactly the behavior that we want – namely, to focus
on the rate of growth of a function as the inputs get large, ignoring constant factors and lower order terms.
As a sanity check, consider the following example and non-example.
Claim 2. All degree-k polynomials are O(nk ).

Proof of Claim. Suppose T (n) is a degree-k polynomial. That is, T (n) = ak nk + . . . + a1 n + a0 for some
choice of ai ’s where ak 6= 0. To show that T (n) is O(nk ) we must find a c and n0 such that for all n ≥ n0
T (n) ≤ c · nk . (Since T (n) represents the running time of an algorithm, we assume it is positive.) Let n0 = 1
and let a∗ = maxi |ai |. We can bound T (n) as follows:

T (n) = ak nk + . . . + a1 n + a0
≤ a∗ nk + . . . + a∗ n + a∗
≤ a∗ nk + . . . + a∗ nk + a∗ nk
= (k + 1)a∗ · nk

Let c = (k + 1)a∗ which is a constant, independent of n. Thus, we’ve exhibited c, n0 which satisfy the Big-Oh
definition, so T (n) = O(nk ). 
Claim 3. For any k ≥ 1, nk is not O(nk−1 ).

Proof of Claim. By contradiction. Assume nk = O(nk−1 ). Then there is some choice of c and n0 such that
nk ≤ c · nk−1 for all n > n0 . But this in turn means that n ≤ c for all n ≥ n0 , which contradicts the fact
that c is a constant, independent of n. Thus, our original assumption was false and nk is not O(nk−1 ). 

You might also like