Probabilistic Analysis Overview
Probabilistic Analysis Overview
Key Concepts:
1. Define the Random Inputs: Specify the random variables or random inputs that
influence the behavior of the algorithm.
2. Determine the Probability Distribution: Describe how these inputs are distributed,
i.e., what the probabilities are for different inputs.
3. Analyze the Algorithm: Analyze the performance of the algorithm for different
possible outcomes.
4. Compute the Expected Performance: Using the probability distribution, compute the
Let’s consider a simple example: analyzing the expected time complexity of a randomized
quicksort algorithm.
The time complexity depends on how balanced the partitions are. In the best case, the
partitions are evenly split, while in the worst case, the partitioning is unbalanced.
However, in probabilistic analysis, we compute the expected time complexity over all
possible random choices of pivots.
pseudo
[1, 2, 3, 3, 4, 5, 5]
• The random input here is the choice of the pivot. The pivot is chosen randomly from
the array.
• Each element has an equal probability of being selected as the pivot. So, the
probability of selecting any particular element is `1 / n`.
• The expected time complexity for sorting an array of size `n` involves `T(n) = T(k) + T(n
- k - 1) + O(n)` where `k` is the number of elements smaller than the pivot.
Conclusion:
Recurrence Relation
scss
[1, 2, 3, 3, 4, 5, 5]
Where:
Let’s take the Merge Sort algorithm as an example, where a list is recursively divided into
two halves until each subproblem contains a single element, and then the sorted sublists
are merged.
1. Dividing the Problem: The array of size `n` is divided into two subarrays, each of size
`n/2`.
2. Conquering the Subproblems: Merge sort is recursively applied to each subarray.
3. Combining the Results: The two sorted subarrays are merged in linear time, `O(n)`.
scss
Where:
We can solve recurrence relations using several methods, including substitution method,
recursion tree method, and master theorem.
scss
[1, 2, 3, 3, 4, 5, 5]
The solution depends on the comparison between `a` and `b^d`. The possible solutions
are:
• `a = 2`
• `b = 2`
• `d = 1` (since the merging step takes `O(n)` time)
Since `a = b^d` (i.e., `2 = 2^1`), the time complexity of Merge Sort is:
[1, 2, 3, 3, 4, 5, 5]
pseudo
[1, 2, 3, 3, 4, 5, 5]
• This breaks down the problem into two halves, creating two recursive calls.
• Each of these recursive calls works on a subproblem of size `n/2`, so we have `2 *
T(n/2)`.
• Merge Step: The merging step takes linear time, `O(n)`, to combine the two sorted
subarrays.
Thus, the recurrence `T(n) = 2 * T(n / 2) + O(n)` accurately represents the time complexity
of Merge Sort.
Summary:
Probabilistic Analysis
• Randomized algorithms: Algorithms that use randomness in their logic (e.g., random
pivot selection in Quicksort).
• Average-case analysis: Analyzing an algorithm's performance on "average" inputs,
assuming a probability distribution over possible inputs.
Key Concepts:
• Random variables: Quantities that result from random events and have different
outcomes with specific probabilities.
• Expected value (mean): The average value of a random variable across all possible
outcomes.
• Probability distribution: Describes the likelihood of different outcomes for a random
variable.
1. Divide: Pick a random element as the pivot, partition the array around the pivot such
that all elements smaller than the pivot go to the left, and all elements larger go to the
right.
2. Conquer: Recursively apply QuickSort to the left and right subarrays.
3. Combine: Since the problem is solved by the recursive subproblems, there is no
explicit combining step.
pseudo
[1, 2, 3, 3, 4, 5, 5]
• The pivot selection is randomized. Any element can be chosen as the pivot with equal
probability.
2. Probability Distribution:
• Each element has a probability of `1/n` to be chosen as the pivot, where `n` is the size
of the array.
scss
[1, 2, 3, 3, 4, 5, 5]
Where `k` is the number of elements smaller than the pivot, and `n - k - 1` is the number
of elements larger than the pivot. Since the pivot is chosen randomly, the expected value
of `k` is around `n/2`.
scss
[1, 2, 3, 3, 4, 5, 5]
Analysis Explanation:
• The recursion depth is proportional to `log n` because, on average, the array is split
into two halves.
• The partition step takes `O(n)` at each level of recursion.
• Thus, the total expected time complexity is `O(n log n)`.
• Determine which variables or inputs of the algorithm are random (e.g., pivot selection
in Randomized QuickSort).
• Analyze the algorithm’s behavior based on different possible outcomes of the random
variable (e.g., different pivot selections).
• Using the probability distribution, compute the expected value of the performance
metric (time, space, etc.). This is done by averaging over all possible outcomes weighted
by their probability.
pseudo
[1, 2, 3, 3, 4, 5, 5]
Example Walkthrough:
Let’s assume we are analyzing Randomized QuickSort using probabilistic analysis. The
pivot selection is the random variable, and each element is equally likely to be chosen as
the pivot. We are trying to compute the expected running time.
1. Random Variables:
2. Probability Distribution:
3. Expected Time:
• We calculate the expected time complexity by considering every possible pivot and
averaging the running time across all possible pivot choices. The recurrence relation gives
us the expected time complexity `O(n log n)`.
Conclusion:
competitive analysis exaplain it with pseudo code and algorithm and example
Competitive Analysis
Key Concepts:
python
[1, 2, 3, 3, 4, 5, 5]
Example:
The Ski Rental Problem is a classic example used in competitive analysis. The problem is
as follows:
• You are renting skis for an unknown number of days. Renting the skis costs $10 per
day, and buying the skis costs $100.
• You don’t know in advance how many days you will ski, so you must decide each day
whether to continue renting or buy the skis.
The optimal offline algorithm knows exactly how many days you will ski:
Online Algorithm:
You don’t know how many days you will ski. You have to decide whether to rent or buy
each day.
Online Strategy:
One simple strategy is to rent the skis until the rental cost equals the purchase cost, and
then buy the skis. This means you will rent the skis for 10 days, then buy them on the 11th
day.
1. If you end up skiing for fewer than 10 days, you only incur the rental cost.
2. If you ski for 11 days or more, you pay $100 for rentals and then buy the skis for $100
on the 11th day, making the total cost $200.
Competitive Ratio:
• Best-case scenario: If you only ski for 10 days, you will pay exactly $100, which is
equal to the optimal cost. So, the competitive ratio is `1` in this case.
• Worst-case scenario: If you ski for an arbitrarily large number of days (say `n` days),
the online strategy will pay $200 (since you rent for 10 days and buy on the 11th day),
while the optimal offline strategy would only pay $100 (buy the skis immediately). Thus,
the competitive ratio in this worst-case scenario is `2`.
Hence, the competitive ratio for this strategy is `2`, meaning that the online algorithm's
pseudo
[1, 2, 3, 3, 4, 5, 5]
• Define the cost incurred by the online algorithm and the optimal offline algorithm for
solving the problem.
• Analyze the behavior of the online algorithm over different input sequences, especially
focusing on worst-case scenarios.
• For each input sequence, compare the cost of the online algorithm to the cost of the
optimal offline algorithm.
• Calculate the worst-case ratio of the online algorithm's cost to the optimal offline
algorithm's cost.
In the paging problem, you have a cache that can hold a limited number of pages (e.g., 3
pages), and you need to decide which page to evict when a new page is requested but the
cache is full. The goal is to minimize the number of page faults (cache misses).
• The optimal offline algorithm (OPT) knows the entire sequence of page requests in
advance and always evicts the page that will not be requested for the longest period of
time in the future.
Online Algorithms:
• Least Recently Used (LRU): Always evicts the least recently used page.
• First-In-First-Out (FIFO): Evicts the oldest page.
• Randomized: Evicts a randomly selected page.
• LRU and FIFO have a competitive ratio of `k`, where `k` is the number of pages in the
cache.
• The optimal offline algorithm has a competitive ratio of `1`.
pseudo
[1, 2, 3, 3, 4, 5, 5]
• LRU evicts the least recently used page whenever a new page is requested and the
cache is full.
• The competitive ratio for LRU is `k`, where `k` is the cache size.
Conclusion:
Competitive analysis is a powerful tool for analyzing online algorithms, where decisions
1. Define the problem: Identify the online problem and its constraints (e.g., paging, ski
rental).
2. Define the optimal offline algorithm: Determine the best possible offline algorithm
with full knowledge of future inputs.
3. Analyze the online algorithm: Create an online strategy and calculate its cost for
different input sequences.
4. Compute the competitive ratio: Compare the online algorithm's performance with
the optimal offline algorithm and compute the worst-case ratio.
Competitive analysis helps design online algorithms that are robust even when future
inputs are unknown or adversarial.
competitive analysis exaplain it with pseudo code and algorithm and example and with
problem solving by also
Competitive Analysis
Key Concepts
• Online algorithm: An algorithm that makes decisions based only on the information
available at the moment, without knowledge of future inputs.
• Offline algorithm: An algorithm that has full knowledge of the input sequence in
advance and can make optimal decisions.
• Competitive ratio: The worst-case ratio of the cost incurred by the online algorithm to
the cost incurred by the optimal offline algorithm.
Competitive Ratio=max(Cost of Online AlgorithmCost of Optimal Offline Algorithm)
\text{Competitive Ratio} = \max \left( \frac{\text{Cost of Online Algorithm}}{\text{Cost of
Optimal Offline Algorithm}} \right)Competitive Ratio=max(
Cost of Optimal Offline AlgorithmCost of Online Algorithm
)
1. Define the problem: Clearly state the problem and its constraints.
2. Determine the optimal offline algorithm: Find the best possible offline algorithm
that knows the future.
3. Design an online algorithm: Create a strategy to solve the problem online, making
decisions based on current knowledge.
4. Compute the cost of the online algorithm: Calculate the cost or performance metric
for the online algorithm for a given input sequence.
5. Compare with the optimal offline algorithm: Compute the ratio of the cost of the
online algorithm to the cost of the optimal offline algorithm for different inputs.
6. Find the competitive ratio: Maximize the ratio over all possible input sequences to
find the worst-case scenario.
Example 1: The
The Ski Rental Problem is a simple example of competitive analysis. In this problem, you
need to decide whether to rent skis each day or buy them, but you don’t know how long
you will ski.
The optimal offline solution knows in advance how many days you will ski.
Since you don’t know how many days you will ski, the online strategy is:
• Rent skis until the rental cost equals the cost of buying, then buy the skis.
• Rent for 10 days, then buy on the 11th day if you continue skiing.
Competitive Ratio:
• Best case: If you ski exactly 10 days, both the online and offline algorithms pay $100,
so the competitive ratio is `1`.
• Worst case: If you ski for `n` days where `n > 10`, the online algorithm pays $200
(rents for 10 days at $10/day and buys the skis for $100), while the offline algorithm pays
$100 (buying skis immediately). Thus, the competitive ratio is:
Competitive Ratio=200100=2\text{Competitive Ratio} = \frac{200}{100} = 2
Competitive Ratio=100200
=2
So, the online algorithm is 2-competitive, meaning its cost is at most twice the cost of the
optimal offline solution.
pseudo
[1, 2, 3, 3, 4, 5, 5]
In the paging problem, you manage a cache of limited size, and the goal is to minimize
the number of page faults (cache misses) when a page is not in the cache.
The optimal offline solution, known as Belady's Algorithm, evicts the page that will not be
needed for the longest time in the future.
Online Solution:
There are several online strategies for the paging problem, including:
1. Least Recently Used (LRU): Evict the least recently used page.
2. First-In-First-Out (FIFO): Evict the page that was loaded the earliest.
3. Randomized: Evict a random page from the cache.
The LRU algorithm is k-competitive, meaning that in the worst case, it incurs `k` times
more page faults than the optimal offline algorithm, where `k` is the cache size.
pseudo
[1, 2, 3, 3, 4, 5, 5]
Competitive Ratio:
• In the worst case, an adversary can construct a page request sequence that forces LRU
In this problem, we are given a bipartite graph where one set of nodes (left side) is known
in advance, and nodes from the other set (right side) arrive one at a time. Each arriving
node reveals its edges to the left-side nodes, and we must immediately match it to an
unmatched node or leave it unmatched.
Problem Description:
• Goal: Maximize the number of matched pairs between the two sets of nodes.
• Online nature: Nodes from the right side arrive one at a time, and we must decide
their match without knowing future nodes.
The optimal offline solution knows the entire set of right-side nodes and can compute the
maximum matching in the graph.
Online Algorithm:
• When a new right-side node arrives, match it with any available unmatched left-side
node.
pseudo
[1, 2, 3, 3, 4, 5, 5]
Conclusion
By using competitive analysis, we ensure that the online algorithm performs reasonably
well in the worst-case scenario, even when compared to the best possible offline solution.
counting sort with example and pseudo code and algorithm with their problem solving
Counting Sort is a linear-time sorting algorithm that sorts elements by counting the
number of occurrences of each unique element. It works best when the range of input
values (i.e., the difference between the maximum and minimum values) is not significantly
larger than the number of elements. Counting Sort is particularly effective for sorting
integers or objects with integer keys.
• Time Complexity: O(n + k), where `n` is the number of elements to sort and `k` is the
range of the input (maximum value - minimum value).
• Space Complexity: O(n + k).
• Stable: Counting Sort maintains the relative order of equal elements.
• Non-comparative: Counting Sort does not rely on comparison between elements like
comparison-based algorithms (e.g., Merge Sort or Quick Sort).
1. Count the occurrences: For each unique element in the input array, Counting Sort
counts how many times that element appears.
2. Compute positions: Using the counts, the algorithm calculates the position of each
element in the sorted array.
3. Build the sorted array: Finally, elements are placed in their correct positions in the
output array.
Example
css
[1, 2, 3, 3, 4, 5, 5]
Step-by-Step Process:
1. Find the range: Determine the minimum and maximum values of the array. In this
2. Count occurrences: Create a count array to count the occurrences of each value.
less
[1, 2, 3, 3, 4, 5, 5]
• The count array represents the number of occurrences of each value. For example, the value `2` appears twice, so
`count[2] = 2`, and the value `1` appears once, so `count[1] = 1`.
3. Calculate cumulative counts: Modify the count array by computing cumulative sums
to determine the final position of each element.
[1, 2, 3, 3, 4, 5, 5]
• The value at each position in the count array now represents the index at which the corresponding element should be placed in
the sorted array. For example, `count[2] = 3`, meaning the last occurrence of the value `2` will be at
index 2 (0-based index).
4. Build the sorted output array: Place each element in the sorted output array based
on the count array and decrement the count for each element.
[1, 2, 3, 3, 4, 5, 5]
Pseudo Code
[1, 2, 3, 3, 4, 5, 5]
css
[1, 2, 3, 3, 4, 5, 5]
2. Count occurrences:
After processing the array [4, 2, 2, 8, 3, 3, 1], the count array becomes:
css
[1, 2, 3, 3, 4, 5, 5]
3. Cumulative count:
Modify the count array to store cumulative counts:
css
[1, 2, 3, 3, 4, 5, 5]
css
css
[1, 2, 3, 3, 4, 5, 5]
Time Complexity
The time complexity is linear with respect to the number of elements and the range of the
values.
• Space complexity: O(n + k) due to the count array and the output array.
Input:
css
[1, 2, 3, 3, 4, 5, 5]
2. Count occurrences:
3. Cumulative counts:
csharp
[1, 2, 3, 3, 4, 5, 5]
Advantages:
1. Linear Time: Counting Sort runs in O(n + k), making it faster than comparison-based
sorts for small ranges.
Disadvantages:
1. Limited to small ranges: Counting Sort is not suitable when the range of input values
is large relative to the number of elements, as this increases space complexity.
2. Only works for integers: Counting Sort is not suitable for floating-point numbers or
strings without modification.
Conclusion
Counting Sort is a highly efficient algorithm for sorting when the range of input values is
not significantly larger than the number of elements. Its linear time complexity makes it
advantageous in cases where comparison-based sorts like Quick Sort and Merge Sort
would take longer. The algorithm's simplicity and stability make it a good choice for
specific problems, especially when sorting integers. However, it may not be suitable for
cases where the range of values is too large, as it incurs significant space overhead in such
scenarios.
Let:
The time complexity of Counting Sort is analyzed based on the following operations:
1. Count Array Creation: Takes O(k) to initialize a count array of size `k`.
2. Counting Occurrences: Takes O(n) to count the occurrences of each element in the
input array.
3. Cumulative Sum Calculation: Takes O(k) to modify the count array for cumulative
sums.
4. Building the Output Array: Takes O(n) to place elements in the output array based on
their counts.
• Best Case Time Complexity: O(n + k)• The best case occurs when all the elements
are already sorted or nearly sorted. However, Counting Sort always processes the entire
input array and the count array in O(n + k), regardless of how sorted the array is.
Therefore, the time complexity remains O(n + k).
• Average Case Time Complexity: O(n + k)• The average case also follows the same
process as the best case. Since Counting Sort is not a comparison-based algorithm, it
doesn't depend on the order of the input values. The steps involve scanning the array and
the count array, resulting in O(n + k) complexity.
• Worst Case Time Complexity: O(n + k)• The worst case occurs when the input array
has a wide range of values (i.e., `k` is large). Even in this scenario, the time complexity
remains O(n + k) because Counting Sort always processes the input array in O(n) time and
the count array in O(k) time. The main factor affecting performance in the worst case is
the size of `k`, which could make Counting Sort inefficient when the range of values is
large compared to the number of elements.
• Space Complexity: O(n + k)• Counting Sort uses extra space for both the count array
(of size `k`) and the output array (of size `n`). Hence, the total space complexity is O(n + k).
• The range `k` is small relative to `n` (e.g., when sorting integers with values between 0
and 100).
• The range `k` is significantly larger than `n`, as the algorithm requires O(k) space and
time to process the count array, which increases the overall time and space complexity.
Key Takeaway:
• The range of input values (`k`) is not significantly larger than the number of elements (
`n`).
• It guarantees linear time performance O(n + k) in all cases (best, average, and worst).