0.1 Review (Recurrences)
0.1 Review (Recurrences)
T (n) ≤ 2c(bn/2c)2 + n2
≤ 2c(n/2)2 + n2
= cn2 /2 + n2
= (c/2 + 1)n2
Now (c/2 + 1)n2 ≤ cn2 when c/2 + 1 ≤ c, or when c ≥ 2. So we can let c = 2, and then the
inductive step is proven.
Recursion tree: Ignoring floors and ceilings for now, the zeroth level has a single node with
cost n2 . The first level has two nodes, each with cost (n/2)2 = n2 /4. The third level has
four nodes, with cost (n/4)2 = n2 /16. In general, you can get the cost of a node by taking
the subproblem size and squaring it. At the ith level, this cost is n2 /4i . On the other hand,
the ith level has 2i nodes, so the total cost of each level is n2 /2i .
The height of the tree is given by n/2i = 1 → i = log n
Summing the costs of each level, we get
log n log n
X n2 X 1
2
=n ≤ 2n2
i=0
2i i=0
2i
by the sum of an infinite geometric series. Therefore, the total cost of the algorithm is Θ(n2 ).
1
CS 161 Lecture 5 Jessica Su (some parts copied from CLRS)
Problem: Given an array A with n elements, how can we find the ith smallest element of
that array?
Exercise: Find a naive upper bound on how long it takes to do this.
Answer: We can get an upper bound of O(n log n) by sorting the array (using mergesort,
heapsort, or another O(n log n) algorithm), and then choosing the element at the ith index.
Exercise: Find a lower bound.
Answer: Any algorithm must take all n elements into account if we want to guarantee that
the algorithm produces the right answer. (Note that some algorithms only guarantee that
we get the right answer with high probability, and in that case we don’t necessarily have to
look at all the elements.) So a lower bound is Ω(n).
Today we will learn the “median of medians” algorithm, which is an algorithm for solving
the selection problem in O(n) time.
At any point, the array slice is split into three parts: “left,” “right,” and “unexplored.” The
“left” part contains elements that are known to be smaller than the pivot. The “right” part
is directly to the right of the “left” part, and it contains elements known to be larger than
the pivot. To the right of the “right” part is the unexplored region, and at the very right of
the array slice is the pivot element.
2
CS 161 Lecture 5 Jessica Su (some parts copied from CLRS)
To populate these regions, we maintain a variable, “boundary”, which represents the bound-
ary between the left and the right region. (The variable “j” represents the boundary between
the right and the unexplored region.)
On each iteration j, we decide if A[j] is larger or smaller than the pivot. If A[j] is larger than
the pivot, we add A[j] to the “right” region by adding 1 to j (this is accomplished through
the natural iteration of a for loop).
If A[j] is smaller, then we swap it with the element that is currently at the boundary between
the “left” and the “right” region, then add 1 to both “j” and the “boundary” variable. This
has the effect of increasing the size of the “left” region by 1, while maintaining the elements
in the “right” region.
This figure is an example of Partition in action. Here, p and r are the boundaries of the
array slice, and i is the “boundary” variable. The pivot element is currently at A[r], and at
the end it is moved into its correct position.
3
CS 161 Lecture 5 Jessica Su (some parts copied from CLRS)
4
CS 161 Lecture 5 Jessica Su (some parts copied from CLRS)
5
CS 161 Lecture 5 Jessica Su (some parts copied from CLRS)
6
CS 161 Lecture 5 Jessica Su (some parts copied from CLRS)
The first thing we want to do is get a lower bound on the number of elements we are throwing
away on each recursive call. This will give us a guarantee on how fast our subproblem sizes
are decreasing.
Recall that we have dn/5e medians. At least half of them are greater than or equal to our
pivot x (the median-of-medians). Most of those groups contain two other elements that are
greater than their medians, so they contribute at least 3 elements that are greater than x.
(The only groups that don’t are (1) the group that has fewer than 5 elements, and (2) the
group containing x itself.)
This gives us at least l m
1 n 3n
3 −2 ≥ −6
2 5 10
elements that are greater than x.
By the same logic, at least 3n/10 − 6 elements are less than x.
So on each recursive call, we throw away at least 3n/10 − 6 elements, which means step 5
calls Select recursively on at most 7n/10 + 6 elements.
Steps 1, 2, and 4 take O(n) time. (Step 2 consists of O(n) calls of insertion sort on sets
of size O(1).) Step 3 takes time T (dn/5e), and step 5 takes time at most T (7n/10 + 6),
assuming that T is monotonically increasing.
For the base case, we assume that any input of fewer than 140 elements requires O(1) time.
Then the recurrence is
(
O(1) if n < 140
T (n) ≤
T (dn/5e) + T (7n/10 + 6) + O(n) if n ≥ 140
We use the substitution method to solve the recurrence. Specifically, we will show that
T (n) ≤ cn for some constant c and all n > 0.
For the base case, we choose c large enough that T (n) ≤ cn for all n < 140. Since 140 is a
constant, we can achieve this by declaring c to be a relatively large constant.
Now for the inductive step, we assume T (k) ≤ ck for all k < n.
7
CS 161 Lecture 5 Jessica Su (some parts copied from CLRS)
Also, let a be the constant from the O(n) term (i.e. a constant such that the function
described by the O(n) term is bounded above by an for all n > 0).
By the induction hypothesis, we have
which is at most cn if
−cn/10 + 7c + an ≤ 0.
When n > 70, this is equivalent to the inequality c ≥ 10a(n/(n − 70)). Because we assume
that n ≥ 140, we have n/(n − 70) ≤ 2, so we can choose c ≥ 20a to satisfy the inequality.
(In addition, recall that c must be large enough to satisfy the base case.)
Thus, the median of medians algorithm runs in O(n) time.