Introduction to Randomized Algorithms
Introduction to Randomized Algorithms
RANDOMIZED ALGORITHMS
Nguyen An Khuong, CSE-HCMUT
1/ 53
Outline
Introduction
Miller-Rabin PRIMARILY TEST
QUICKSORT
Karger’s MINIMUM CUT
SELECTION Problem
Introduction
Deterministic algorithm
Divide-and-conquer with random pivoting
Random sampling strategy for selection
Appendix: Discrete Random variables
References and Further Reading
2/ 53
1. Introduction
3/ 53
Deterministic algorithms vs randomized algorithms
▶ Deterministic:
input Algorithm output
Goal: To prove that the the algorithm solves the problem correctly (always)
and quickly (typically, the number of steps should be polynomial in the
size of the input.)
In other words, for all inputs, output is good.
▶ Randomized:
input Algorithm output
random bits
▶ In addition to input, the algorithm takes a source of random numbers and
makes random choices during execution. Behavior can vary even on a fixed
input. Output is good for all inputs with good probability. 4/ 53
Randomized Algorithms vs Probabilistic Analysis of Algorithms
input Algorithm output
random bits
Goal: Design algorithm + analysis to show that this behavior is likely to be
good on every input?
(The likelihood is over the random numbers only.)
▶ Not to be confused with the Probabilistic Analysis of Algorithms:
9/ 53
Introduction
▶ Proposed in 70’s.
▶ Miller and Rabin gave two versions of the same algorithm to test whether a
number n is prime or not.
▶ Whereas Rabin’s algorithm works with a randomly chosen a ∈ Zn , and is
therefore a randomized one, Miller’s version tests deterministically for all a’s,
where 1 ≤ a ≤ 4 log2 n.
▶ But the correctness of Miller’s algorithm depends on the correctness of
Extended Riemann Hypothesis.
10/ 53
Miller-Rabin Algorithm
▶ Let ψ be an automorphism in Zn .
▶ Let n − 1 = s × 2t for odd s.
▶ the ALGORITHM
1. Test if n = mj for j > 1. If yes, output COMPOSITE.
2. Randomly choose a ∈ Zn .
3. Test if an−1 = 1( mod n). If no, output COMPOSITE.
i
4. Compute ui = as×2 ( mod n) for 0 ≤ i < t.
5. If there is an i such that ui = 1 and ui−1 ̸= ±1, output COMPOSITE.
6. output PRIME.
11/ 53
Correctness of Miller-Rabin Algorithm
12/ 53
Time Complexity of Miller-Rabin Algorithm
13/ 53
3. QUICKSORT
14/ 53
The algorithm
Our goal is to sort of a sequence S = (x1 , ..., xn ) of n distinct real numbers in
increasing order. We use a recursive method known as Quicksort which
proceeds as follows:
Algorithm (Hoare, 1962)
1. If S has one or zero elements return S.
2. Pick some element x = xi in S called the pivot.
3. Reorder S in such a way that for every number xj ̸= x in S, if xj < x , then
xj is moved to a list S1 , else if xj > x then xj is moved to a list S2 .
4. Apply this algorithm recursively to the list of elements in S1 and to the list
of elements in S2 .
5. Return the sorted list S1 , x , S2 .
15/ 53
Quicksort Demonstration
Given
16/ 53
Quicksort’s Average Running Time Analysis
▶ To have a good “average performance,” one can randomize this algorithm
by assuming that each pivot is chosen at random.
▶ Let us compute the expectation of the number X of comparisons needed
when running the randomized version of Quicksort.
▶ Recall that the input is a sequence S = (x1 , ..., xn ) of distinct elements, and
that (y1 , ..., yn ) has the same elements sorted in increasing order.
▶ In order to compute E (X ), we decompose X as a sum of indicator variables
Xi,j , with Xi,j = 1 iff yi and yj are ever compared, and Xi,j = 0 otherwise.
j−1
n X j−1
n X
▶ Then, it is clear that X = Xi,j and E (X ) = E (Xi,j ).
X X
k=1 k
▶ The quantity Hn = 1 + 1/2 + 1/3 + ... + 1/n is called the “nth harmonic
number”, and is in the range [ln n, 1 + ln n] (this can be seen by considering
the integral 1n x1 dx ).
R
▶ So, < Hn − n1 .
Rn 1
1 x dx 20/ 53
Approximation of the nth harmonic number Hn (cont’d)
1 1 1
▶ Moreover Hn − 1 = + + · · · + is an underestimation of the area:
2 3 n
22/ 53
Introduction
▶ Given a graph G = (V , E ) with n vertices and m edges, find a global
minimum cut?
▶ That is to find a set of vertices S ⊂ V , with 1 ≤ |S| ≤ n − 1, which
minimizes the number of edges going from this subset S to the rest of the
vertices.
▶ We denote the cut value for set S by |E (S, S̄)| (where S̄ = V \ S).
▶ The problem of finding a minimum cut with a given source and sink (s − t
min cut) is probably more familiar.
▶ Dinic’s algorithm or Ford-Fulkerson algorithm leads to a solution.
▶ We can compute a s − t minimum cut in O(n3 ).
▶ By running this algorithm for all (s, t) pairs, we get a solution for the global
minimum cut in O(n5 ).
23/ 53
The Algorithm
Algorithm It looks “stupid”, but actually it’s not that stupid.
1: function Algo1
2: while n > 2 do
3: Choose an edge e = (u, v ) randomly from the remaining edges
4: Contract that edge
5: end while
6: return the two last vertices
7: end function
▶ Contracting an edge means we remove that edge and combine the two
vertices into a super-node.
▶ We note that self-loops thus formed are removed, but any resulting parallel
edges are not removed. This means at every step, we have a multi-graph
without any self-loops.
▶ This algorithm will yield a cut. The minimum cut? Probably not. But maybe!
▶ If the graph is somewhat dense, then choosing an edge in the minimum cut
is rather unlikely at the beginning... and very likely at the end. 24/ 53
Analyzing the probability of the correct answer
▶ Let’s say that the algorithm “fails” in step i if, in that step, we choose to
contract an edge in the minimum cut (here, we suppose that there is only
one minimum cut).
▶ For all u ∈ V , let d(u) be the degree of u. We have
1 X 2·m
|E (S, S̄)| ≤ min d(u) ≤ d(u) = .
u n u∈V n
▶ So, |E (S,
m
S̄)|
≤ n2 .
▶ We fail in the first step if we pick an edge in the minimum cut, an event
|E (S, S̄)| 2
that occurs with probability: P(fail in 1st step) = ≤ .
m n
2
P(fail in 2nd step | success in 1st step) ≤
n−1
···
2
P(fail in i th step | success till (i − 1)th step) ≤ .
n−i +1 25/ 53
Analyzing the probability of the correct answer (cont’d)
▶ Let Zi be the event that the algorithm succeeds in step i. Thus, we have
n−2
P(Z1 ) ≥
n
n−3
P(Z2 |Z1 ) ≥
n−1
···
n−i −1
P(Zi |Zi−1 ∩ · · · ∩ Z1 ) ≥
n−i +1
▶ , Therefore, P(Success) = P(Z1 ∩ Z2 ∩ · · · ∩ Zn−2 )
= P(Z1 ) · P(Z2 |Z1 ) · · · P(Zn−2 |Z1 ∩ Z2 ∩ · · · ∩ Zn−3 )
n−2 n−3 n−4 n−5 2 1
≥ · · · ··· ·
n n−1 n−2 n−3 4 3
1 1 2 1 2
= · · · ≥ 2.
n n−1 1 1 n 26/ 53
Analyzing the probability of the correct answer (cont’d)
▶ If n is too large, this is a very poor guarantee.
▶ So what we do instead is run the algorithm several times and choose the
best min-cut we find. If we run this algorithm k times, we then have:
2
k
P(Success at least once) ≥ 1 − 1 − 2 .
n
1
▶ A classical inequality yields (1 − z) z ≤ 1e .
▶ We hence choose to run the algorithm k times, with k = n2
2
log 1δ .
▶ We get:
log δ1
1
P(Success) ≥ 1 − = 1 − δ.
e
▶ After running all these k trials, we choose the one with the min. cut value.
27/ 53
Complexity
28/ 53
Min-Cut: Can we do better?
▶ Initial stages of the algorithm are very likely to be correct.
▶ In particular, the first step is wrong with probability at most 2/n.
▶ As we contract more edges, failure probability goes up.
▶ Moreover, earlier stages take more time compared to later ones.
▶ Idea: Let us redistribute our iterations. Since earlier ones are more accurate
and slower, why not do less at the beginning and increasingly more as the
number of edges decreases?
▶ Let us formalize this idea. After t steps:
n−2 n−3 n−3 n−t −1 (n − t)2
P(Success) ≥ · · ··· ≈ .
n n−1 n−1 n−t +1 n2
▶ We equate this to 12 to get t = n − √n . Thus, we have the following
2
algorithm:...
29/ 53
A better MIN-CUT
Algorithm Getting better
1:function Algo2
2: Repeat Twice: √
3: Run contraction from n → n/ √ 2
4: Recursively call Algo2 on n/ 2 set
5: return the last two vertices for the best cut among the two
6:end function
Runtime analysis:
!!
n
T (n) = 2 O(n2 ) + T √
2
!2
n
= 2cn2 + 2 · 2 · c · √ + ···
2
= O(n log n).
2
30/ 53
Success Probability
33/ 53
Selection problem
Intuition:
1. At first, an element ai is chosen to split S into two parts
S + = {aj : aj ≥ ai }, and S − = {aj : aj < ai }.
2. We can determine whether the k-th median is in S + or S − .
3. Thus, we perform iteration on ONLY one subset.
36/ 53
How to choose a splitter?
We have the following options:
▶ Bad choice: select the smallest element at each iteration.
T (n) = T (n − 1) + O(n) = O(n2 )
▶ Ideal choice: select the median at each iteration. T (n) = T ( n2 ) + O(n) = O(n)
▶ Good choice: select a “centered” element ai , i.e., |S + | ≥ ϵn, and |S − | ≥ ϵn for a
fixed ϵ > 0.
▶ T (n) ≤ T ((1 − ϵ)n) + O(n) ≤ cn + c(1 − ϵ)n + c(1 − ϵ)2 n + .... = O(n).
▶ e.g.: ϵ = 14 :
37/ 53
BFPRT algorithm: a linear deterministic algorithm
▶ Still using the idea of choosing a splitter. The ideal splitter is the median;
however, finding the median is precisely our objective.
▶ Thus, just try to get “something close to the median”, say within n4 from
the median.
▶ How can we get something close to the median? Instead of finding the
median of the “whole set”, find a median of a “sample”.
▶ But how to choose a sample? Medians again!
38/ 53
Median of medians algorithm [Blum, 1973]
Algorithm “Median of medians”
1: Line up elements in groups of 5 elements;
2: Find the median of each group; (takes O( 6n 5
) time)
3: Find the median of medians (denoted as M); (takes T ( n5 ) time)
4: Use M as a splitter to partition the input and call the algorithm recursively on
one of the partitions.
Analysis: T (n) = T ( n5 ) + T ( 7n
10
) + 6n
5
at most 24n comparisons (here, 7n10
comes
from the fact that at least 3n
10
can be deleted by using M as the splitter.) 39/ 53
Randomized divide-and-conquer
Algorithm RandomSelect(n, k)
1: Choose an element si from S uniformly at random;
2: S + = {};
3: S − = {};
4: for j = 1 to n do
5: if sj > si then
6: S + = S + ∪ {sj };
7: else
8: S − = S − ∪ {sj };
9: end if
10: end for
11: if |S − | = k − 1 then
12: return si ;
13: else if |S − | > k − 1 then
14: return RandomSelect(S − , k);
15: else
16: return RandomSelect(S + , k − 1 − |S − |);
17: end if
40/ 53
Randomized divide-and-conquer cont’d
e.g.: ϵ = 14 :
41/ 53
Randomized divide-and-conquorer cont’d
Theorem. The expected running time of Select(n,k) is O(n).
Proof.
▶ Let ϵ = 14 . We’ll say that the algorithm is in phase j when the size of set under
consideration is in [n( 34 )j−1 , n( 34 )j ].
▶ Let X be the number of steps. And Xj be the number of steps in phase j. Thus,
X = X0 + X1 + ....
▶ Consider the j-th phase. The probability of finding a centered splitter is ≥ 1
2 since
at least half of the elements are centered. Thus, the expected number of
iterations to find a centered splitter is 2.
▶ Each iteration costs cn( 34 )j steps since there are at most n( 34 )j elements in phase
j. Thus, E (Xj ) ≤ 2cn( 43 )j .
▶ E (X ) = E (X0 + X1 + ....) ≤ 2cn( 34 )j ≤ 8cn.
P
j
42/ 53
A “random sampling” algorithm [Floyd & Rivest, 1975]
Basic idea: randomly sample a subset to represent the whole set.
Two requirements of M:
▶ On one side, M should be LARGE enough such that the median is covered
by M with a high probability;
▶ On the other side, M should be SMALL enough such that Step 4 will not
take a long time; 44/ 53
Time-complexity analysis
Running time:
▶ Step 2: O(r log r ) = o(n); (sorting R)
▶ Step 3: 2n steps. (O(n) + O(|M| + |H|))
▶ Step 4: O(δn log(δn)).
3 1
Setting r = n 4 , and δ = n− 4 . The time bound of Step 4 changes to:
▶ Step 4: O(δn log(δn)) = o(n).
Total steps: 2n + o(n).
▶ The best-known deterministic algorithm 3n. But too complicated.
▶ A lower bound: 2n.
45/ 53
Error probability analysis I
Theorem 1
With probability 1 − O(n− 4 ), the RandomSamplingSelect algorithm reports the
median in the first pass. Thus, the running time is only 2n + o(n).
E (X ) = 12 r and σ 2 (X ) = 41 r .
47/ 53
6. Appendix: Discrete Random variables
48/ 53
Random Variables
▶ When looking at independent replications of a binary experiment, we would
not usually be interested in whether each case is a success or a failure but
rather in the total number of successes (or failures).
▶ Obviously, this number is random since it depends on the individual random
outcomes,
▶ , and it is consequently called a random variable.
▶ In this case it is a discrete-valued random variable that can take values
0, 1, ..., n, where n is the number of replications.
▶ A random variable X has a probability distribution that can be described
using point probabilities fX (x ) = p(X = x ),
▶ or the cumulative distribution function F (x ) = p(X ≤ x ).
▶ Expected value (giá tr kỳ vng): µ = E (X ) = x · p(X = x ).
P
Disability 5000 2
1000
Neither 0 997
1000
X : amount of payment, is a discrete random variable (bin ngu nhiên ri rc). The
company expects that they have to pay each customer:
1 2 997
E (X ) = $10, 000( ) + $5000( ) + $0( ) = $20.
1000 1000 1000
50/ 53
Variance: An Example
▶ Of course, the expected value $20 will not always happen in reality.
▶ There will be variability. Let’s calculate!
▶ So V (X ) = 99802 ( 1000
1
) + 49802 ( 1000
2
) + (−20)2 ( 1000
997
) = 149, 600.
√
▶ and SD(X ) = 149, 600 ≈ $386.78
Comment
The company expects to pay out $20 and make $30. However, the standard
deviation of $386.78 indicates that it’s no sure thing. That’s a pretty big spread
(and risk) for an average profit of $30.
51/ 53
7. References and Further Reading
52/ 53
Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. Introduction to
Algorithms, 4th ed. The MIT Press, 2022.
Dasgupta, S., Papadimitriou, C.H., and Vazirani, U.V. Algorithms. McGraw
Hill, 2006.
Motwani, R and Prabhakar Raghavan, R. Randomized Algorithms,
Cambridge University Press, 1995.
Ross, S.M. Probability Models for Computer Science. Academic Press 2008.
53/ 53