3 RandomizedAlgorithms
3 RandomizedAlgorithms
We already learned quite a few randomized algorithms in the online algorithm lectures. For
example, the MARKING algorithm for paging was a randomized algorithm; as well as the
Randomized Weighted Majority. In designing online algorithms, randomization provides much
power against an oblivious adversary. Thus, the worst case analysis (the worst case of the
expectation) may become better by using randomization. We also noticed that randomization
does not provide as much power against the adaptive adversaries.
We’ll study some examples and concepts in randomized algorithms. Much of this section is
based on (Motwani and Raghavan, Randomized Algorithm, Chapters 1, 5, 6).
Random QuickSort
Suppose is an array of numbers. Sorting puts the numbers in ascending order. The quick sort
is one of the fastest sorting algorithm.
QuickSort:
For worst case analysis, QuickSort may perform badly because may be such that and are
awfully imbalanced. So the time complexity becomes ( ). If we don’t like average analysis,
which is venerable to a malicious adversary, we can do randomization.
Proof: Clearly any pair of elements are compared at most once during any execution of the
algorithm. Let ( ) be the -th ranked element in . Let be an indicator that whether
elements ( ) and ( ) are ever compared directly against each other during the algorithm. Let
( ).
Notice in any step of the RandomQS algorithm, if the pivot is not one of ( ) (
) ( ), then the iteration is irrelevant to the event . The only thing that is relevant is
that one of ( ) ( ) ( ) is selected as a pivot. If ( ) or ( ) is selected, then they are
compared directly. If any other element in this series is selected, then ( ) or ( ) will never be
compared directly.
to say . Therefore,
[ ] [∑ ∑ ] ∑∑ ∑∑ ∑∑
QED.
Note that .
Comments on Randomization
The performance of a randomized algorithm does not rely on the intention of the adversary
(such as a hacker), but instead relies on pure chance.
We’ve seen examples where randomization really made a difference for online algorithm
against an oblivious adversary. But for offline algorithms, in general people do not know if
randomization indeed can beat the best possible (but not yet known) deterministic algorithm.
After all, a polynomial time randomized algorithm with input is just a deterministic algorithm
with input and a series of random bits with length bounded by a polynomial. Does knowing a
sequence of random bits really help?
Random Bits
Although it is easy to say an algorithm can make random choices, it is not as easy to obtain a
true random bit in a computer. Traditionally random numbers are generated with physical
devices such as a dice in gambling. But these are too slow and too inconvenient for computation.
Thus, a computer often uses a pseudo random number generator to produce an integer
sequence seemingly random. But it’s not truly random. For example, the Java Random class uses
an integer random seed. Each seed produces a unique deterministic sequence of pseudo-
random numbers. This is clearly not random because it can only give at most possible
sequences. A good example of this is that many people do not reset the default password a
system automatically generated for them. A careless implementation of the key generation with
a pseudo random number generator will make it easy to break the key.
Another source of random is to convert noises in physical world to digital signals. For example,
RANDOM.org uses atmospheric noises to provide random number generation service to the
public. It has generated 1.82 trillion random bits for the Internet community as of August 2014.
Another example is some hardware random number generators (TRNG, or true random number
generator) that are commercially available. These TRNG can utilize thermal noise, the
photoelectric effect, or other quantum phenomena to produce random numbers.
Even if we can generate random bits, there are problems. For examples:
But for most non-critical applications, a pseudo random number generator is good enough.
The trickiest part was the finding of the pivot . In order to make the two lists and
approximately balanced, we had to divide the list into groups of 5-elements each, and then do
a median of medians.
With randomization, we might guess that if we randomly select a pivot from the list, chances are
that divide the list fairly evenly. So we have a randomized algorithm for Select-K.
[ ( )] (∑ [ ( )] ∑ [ ( )]) ∑ [ ( )]
The last is for the ( ) overhead. We prove the linearity by induction. The base case
is trivial. Suppose [ ( )] holds for a constant and . Then
( )
[ ( )] ∑ [ ( )] ∑( )
QED.
Random Min-Cut
Given a graph 〈 〉, a cut is a partition of the vertex set into two disjoint sets and ,
such that . An edge is cut if its two vertices fall in and , respectively. The Min-
Cut problem asks for a cut so that the minimum number of edges are cut.
Deterministic Algorithm
A popular deterministic algorithm is to reduce the Min-Cut problem to Maximum Flow. The so
called s-t Cut problem is to find a Min-Cut so that the two given vertices and fall in the two
vertex sets and , respectively.
Theorem 2: The s-t Min-Cut size is equal to the maximum flow from s to t.
Proof omitted.
To find min-Cut, one can fix an arbitrary vertex and try every other . Then choose the
minimum. Thus, this algorithm needs at most (| | ) time, where is the time complexity of
the maximum flow algorithm.
There are also algorithms that do not rely on Maximum-Flow. One of such algorithms runs in
(| | | | | | | |). (See. Stoer and Wagner, Journal of the ACM, Vol. 44, No. 4, July
1997, pp. 585–591.)
Randomized Algorithm I
This simple algorithm was first published by D. R. Karger in 1994 when he was a Ph.D. student at
Stanford. (Random sampling in cut, flow, and network design problems. Proc. 25th STOC, 648–
657, 1994.)
A multigraph is just a graph that allows multiple edges connecting a pair of vertices.
Karger’s Algorithm: Keep choosing an edge uniformly at random and contracting it; until there
are only two vertices left. Output the two sets of vertices that were contracted to these two
remaining vertices.
Note that for any Min-Cut , the algorithm gives the same cut as the output with a positive
probability: if and only if none of the contracted edges belong to the cut edge set.
Theorem 3: The probability that Karger’s algorithm outputs the Min-Cut is at least ( )
,
where | |.
Proof: When there are vertices in the multi-graph, if the Min-Cut has size , then each vertex
has degree at least . So, the number of edges is at least . Therefore, the probability that the
selected edge is not in the Min-Cut is . Thus, the probability that none of the contraction is in
the Min-Cut is at least
∏
( )
QED.
The running time is ( ) because there are ( ) contractions and each contraction takes at
most ( ) time.
In order to achieve a high probability, we only need to repeat the algorithm ( ) times.
Notice that the probability that none of the repetition worked is at most
( )
( )
( ) (( ) ) ( )
( ) ( )
The total running time would be ( ) in order to achieve high probability because it needs to
be repeated ( ) times.
Rnadomized Algorithm II
Karger’s algorithm was very simple but the running time ( ) is not as good as the best known
deterministic algorithm. This section we improve it so that it runs in ( ) time. The idea
belongs to (D. R. Karger and C. Stein. Proc. 25th STOC, 757–765, 1993).
Re-examine the contraction process. The contraction when there are vertices is safe (selects an
edge outside of the Min-Cut) with probability . This probability is high at the beginning of the
contraction, but reduces significantly towards the end of the contraction. But when we repeat
the randomized algorithm, the first half and second half of the contractions are treated the
same. A better way is to repeat the second half more times.
(⌈ ⌉ ) (⌈ ⌉)
∏ √ √
( )
⌈ ⌉
√
BetterCut(G):
1. Contract multi-graph G to graph that has ⌈ ⌉ vertices.
√
2. BetterCut(G’)
3. BetterCut(G’)
4. Return the better of and .
It seems that we are doing the same thing twice in steps 2 and 3. But since it is randomized
algorithm, and may be different result.
Proof: Let ( ) denote the probability that BetterCut(G) finds the MIN-Cut. This is the case if
and only if: (a) step 1 must not contract any edge in the MIN-Cut; and (b) one of the two steps 2
and 3 finds the MIN-Cut. Therefore,
( ) ( ( ( )) ) ( ) ( )
√ √ √
Note that we ignored the ceiling operation, and removed the +1 from the size of the smaller
graph ⌈ ⌉ – just to make the proof easier to read. Even with them, things are okay.
√
With this recurrence relation, we can prove by induction that ( ) . The base is
( ) ( )
( ) ( )
√ √
One can verify that when , the right hand side is no less than .
QED.
Now the success probability is much larger than Karger’s algorithm. In order to achieve close to
1 probability, we only need to repeat the algorithm times.
( ) ( )
√
So, the time needed by the algorithm to achieve positive constant probability is ( ).
This is better than any currently known deterministic algorithm to compute Min-Cut.
So, which one is better? It depends on applications: Critical software (e.g. OS) v.s. real time.
Theorem 5: Every Las Vegas algorithm can be converted to a Monte Carlo algorithm.
Note that one cannot rely on a slower but simpler algorithm to test in this case, because that
makes the initial effort to code the faster algorithm meaningless. The testing will have to run as
fast as or faster than computation of . There is a very interesting randomized test for
this.
( ) ( )
Note that this can be checked in ( ) time for any given . So, if the computation is correct,
then such a check will always return “yes”. The question is when the computation is wrong, can
we get a “no” answer, with a positive probability?
( )
∑ (∑ )
MultiplicationTest(A,B,C)
1. Select a random 0-1 vector .
2. Return ( )
When the true answer is yes, the algorithm always return yes. When the true answer is no, the
algorithm returns no with probability at least . So this is a Monte-Carlo algorithm with one-
sided error. We can repeat the test multiple times to increase the probability.
Proof: For each vertex, randomly assign one of two different colors with equal probability. Now
consider an edge , its two vertices have different color with probability . Thus, the
| |
expected number of edges that are cut by this random assignment is . Thus, the probability
| |
that a random assignment’s cut size is positive. Thus, there is at least one cut with size
| |
.
QED.
The proof of this lemma demonstrates a so-called “probabilistic method”. This is a way to prove
the existence of something. To show the existence, it suffices to show that its probability is
greater than 0.
Do not confuse probabilistic method with randomized algorithm. The former is a way to prove
something. It doesn’t have to involve an algorithm. Whereas the latter’s goal is the algorithm.
But incidentally, the proof of Max-Cut lemma provides a randomized “approximation” algorithm.
Theorem 6. The algorithm described in the proof of Max-Cut lemma computes a cut with
expected size at least of the Max-Cut.
Proof: Notice the Max-Cut size can’t exceed the total number of edges.
QED.
Derandomization
There are a few standard techniques that can sometimes convert a randomized algorithm to a
deterministic one. These techniques do not always work; and derandomization is a big topic.
We study only one of such techniques to get a sense of derandomization. The technique we
study is called “the method of conditional probabilities”.
Recall that each step of the randomized max-cut chooses a vertex , and randomly assign a
color red or black to it. Before assigning the color, the expected cut size of the algorithm’s
| |
output is . Since
[ ] ( ) [ | ] ( ) [ | )
[ | ] [ | )
| |
At least one of the two possibilities will be such that [ | ] .
It happens that it is easy to calculate [ ] given a subset of vertices are colored: Each
edge with both vertices colored contributes either 0 or 1; and each edge with at least one vertex
uncolored contributes to this expected value. So, we can simply calculate the two expected
values by assigning to be red and black, respectively. Then choose the one with the larger
expected value. This converts the random choice to a deterministic calculation. This process can
be repeated to color all other vertices, while maintaining that
| |
[ | ]
1. The algorithm consists of a series of small random choices. Each choice may take several
possible values.
2. The conditional probability for the algorithm succeeds after making a choice can be
computed, or at least compared to each other (so we can take the largest).
With these two properties, the method of conditional probabilities can be applied to do
derandomization. Suppose the algorithm succeeds with a positive probability before
making a choice. Then ( | ) for at least one value
of the choice. So, one can deterministically choose the one that provides the highest conditional
probability. And this process can be continued to make all the choices deterministically.
Note that in the max-cut example expected value is used instead of the conditional probability.
Unfortunately not all randomized algorithm can be derandomized this way. For example, the
randomized algorithm for min-cut can’t because the conditional probability cannot be easily
calculated.
Notice that these hardness results are for problems but not specific algorithms. We will study
more about lower bounds, complexity classes, and ways to deal with NP-hard problems later in
this course. But now we study a few simple classes for randomized algorithm.
First, P (polynomial time) is the class that contains the problem that can be solved in polynomial
time on a Turing machine or RAM. We delay the discussion of the computing model until later in
this course – since these computing models are mostly equivalent to a general computer.
Most algorithms we’ve studied so far (in this course and CS240, CS341) are for problems in P.
For randomized algorithm, there is a similar class RP (Randomized polynomial time).
Let us introduce a conventional way to define the complexity classes. A decision problem is a
problem that asks whether an input satisfies a certain property. Let denotes the language (the
set of all the inputs that satisfies the property). Then a decision problem asks whether for
input . People often refer to the problem by the language it defines. And when an algorithm
outputs YES for an input , we say the algorithm accepts .
A decision problem belongs to RP if it has an algorithm that runs in worst case polynomial
time such polynomial time such that for any input :
( ( ) ) ,
( ( ) )
Thus, the problem in RP has a Monte-Carlo algorithm with one-sided error. When the algorithm
outputs Yes, it is always Yes. But when the algorithm outputs No, there may be an error.
The threshold is arbitrary. Any constant between 0 and 1 would serve the purpose.
Similarly, we have co-RP (Complement of RP). Consider a decision problem , its complement ̃
is to ask whether gives a No answer. Thus, the correct answer to and ̃ always complement
each other. A problem belongs to co-RP if and only if its complement belongs to RP.
ZPP (zero-error probabilistic polynomial time) is the problems that have polynomial time Las
Vegas algorithms.
PP (Probabilistic Polynomial time) contains of all languages that have a randomized algorithm
running in worst-case polynomial time such that for any input ,
( ( ) ) ,
( ( ) )
PP consists of the problems that have randomized Monte Carlo algorithms making two-sided
errors. Unfortunately this is a very weak condition. The two sided errors can be infinitely close
to so that one cannot boost the probabilities by repeating the algorithm polynomial times.
BPP (Bounded-error Probabilistic Polynomial Time) contains of all languages that have a
randomized algorithm running in worst-case polynomial time such that for any input ,
( ( ) ) ,
( ( ) )
Note that if a problem belongs to BPP, and such an algorithm is given, then we can repeat it to
make the (two-sided) error probabilities to approach 0. So, BPP is fairly strong condition.
The relations of these complexity classes with other classes are not completely known. Some
known relations include:
Theorem: .
But it is not known whether these inclusions are strict. In another word, people still do not know
if randomization indeed provides much power in algorithm design.
Theorem: .
1. Is - ?
2. Is ?
Primality Test
Primality Test: Given a positive odd number , answers whether it is a prime.
Composite Test: Given a positive odd number , answers whether it is not a prime (a
composite).
Note that these two problems can be solved by trivial algorithms in time polynomial of .
However, the input size is actually . Therefore, a polynomial of is exponential to the input
size. We are looking for efficient algorithm whose time complexity is a polynomial of .
Let be a prime number. Then is the field that has elements . The
addition and multiplication operations are defined as
( )
( )
( )⇔( )( ) ( )
( )
Thus, if none of the two situations happen, then is not a prime, and is called a witness for
the compositeness of .
It can be shown (not proved here) that for any composite , at least of the numbers are
witnesses. Therefore, by randomly choosing many of them and test, we have a high probability
to answer “Composite” to a composite . If all of the choices failed to prove is composite,
then we answer “Prime”.
This is an one-sided error Monte-Carlo algorithm. The error happens when the algorithm
answers “Prime”. So, Primality Test belongs to co-RP and Composite Test belongs to RP.
Some other results also showed that Primality Test belongs to , so it belongs to
- .
Remark: