Analysis of Algorithm
Analysis of Algorithm
Given two algorithms for a task, how do we find out which one is better?
One naive way of doing this is – implement both the algorithms and run the two
programs on your computer for different inputs and see which one takes less time.
There are many problems with this approach for analysis of algorithms.
1) It might be possible that for some inputs, first algorithm performs better than the
second. And for some inputs second performs better.
2) It might also be possible that for some inputs, first algorithm perform better on one
machine and the second works better on other machine for some other inputs.
Asymptotic Analysis is the big idea that handles above issues in analyzing
algorithms. In Asymptotic Analysis, we evaluate the performance of an algorithm in
terms of input size (we don’t measure the actual running time). We calculate, how
does the time (or space) taken by an algorithm increases with the input size.
For example, let us consider the search problem (searching a given item) in a sorted
array. One way to search is Linear Search (order of growth is linear) and other way is
Binary Search (order of growth is logarithmic). To understand how Asymptotic
Analysis solves the above mentioned problems in analyzing algorithms, let us say we
run the Linear Search on a fast computer and Binary Search on a slow computer. For
small values of input array size n, the fast computer may take less time. But, after
certain value of input array size, the Binary Search will definitely start taking less
time compared to the Linear Search even though the Binary Search is being run on a
slow machine. The reason is the order of growth of Binary Search with respect to
input size logarithmic while the order of growth of Linear Search is linear. So the
machine dependent constants can always be ignored after certain values of input size.
Does Asymptotic Analysis always work?
Asymptotic Analysis is not perfect, but that’s the best way available for analyzing
algorithms. For example, say there are two sorting algorithms that take 1000nLogn
and 2nLogn time respectively on a machine. Both of these algorithms are
asymptotically same (order of growth is nLogn). So, With Asymptotic Analysis, we
can’t judge which one is better as we ignore constants in Asymptotic Analysis.
Also, in Asymptotic analysis, we always talk about input sizes larger than a constant
value. It might be possible that those large inputs are never given to your software
and an algorithm which is asymptotically slower, always performs better for your
particular situation. So, you may end up choosing an algorithm that is Asymptotically
slower but faster for your software.
In the previous post, we discussed how Asymptotic analysis overcomes the problems
of naive way of analyzing algorithms. In this post, we will take an example of Linear
Search and analyze it using Asymptotic analysis.
We can have three cases to analyze an algorithm:
1) Worst Case
2) Average Case
3) Best Case
Let us consider the following implementation of Linear Search.
#include <stdio.h>
getchar();
return 0;
}
Worst Case Analysis (Usually Done)
In the worst case analysis, we calculate upper bound on running time of an algorithm.
We must know the case that causes maximum number of operations to be executed.
For Linear Search, the worst case happens when the element to be searched (x in the
above code) is not present in the array. When x is not present, the search() functions
compares it with all the elements of arr[] one by one. Therefore, the worst case time
complexity of linear search would be Θ(n).
Average Case Analysis (Sometimes done)
In average case analysis, we take all possible inputs and calculate computing time for
all of the inputs. Sum all the calculated values and divide the sum by total number of
inputs. We must know (or predict) distribution of cases. For the linear search
problem, let us assume that all cases are uniformly distributed (including the case of x
not being present in array). So we sum all the cases and divide the sum by (n+1).
Following is the value of average case time complexity.
= Θ(n)
The above definition means, if f(n) is theta of g(n), then the value f(n) is always
between c1*g(n) and c2*g(n) for large values of n (n >= n0). The definition of theta
also requires that f(n) must be non-negative for values of n greater than n0.
2) Big O Notation: The Big O notation defines an upper
bound of an algorithm, it bounds a function only from above. For example, consider
the case of Insertion Sort. It takes linear time in best case and quadratic time in worst
case. We can safely say that the time complexity of Insertion sort is O(n^2). Note that
O(n^2) also covers linear time.
If we use Θ notation to represent time complexity of Insertion sort, we have to use
two statements for best and worst cases:
1. The worst case time complexity of Insertion Sort is Θ(n^2).
2. The best case time complexity of Insertion Sort is Θ(n).
The Big O notation is useful when we only have upper bound on time complexity of
an algorithm. Many times we easily find an upper bound by simply looking at the
algorithm.
O(g(n)) = { f(n): there exist positive constants c and
n0 such that 0 <= f(n) <= cg(n) for
all n >= n0}
Let us consider the same Insertion sort example here. The time complexity of
Insertion Sort can be written as Ω(n), but it is not a very useful information about
insertion sort, as we are generally interested in worst case and sometimes in average
case.
Exercise:
Which of the following statements is/are valid?
1. Time Complexity of QuickSort is Θ(n^2)
2. Time Complexity of QuickSort is O(n^2)
3. For any two functions f(n) and g(n), we have f(n) = Θ(g(n)) if and only if f(n) =
O(g(n)) and f(n) = Ω(g(n)).
4. Time complexity of all computer algorithms can be written as Ω(1)
In the previous post, we discussed analysis of loops. Many algorithms are recursive in
nature. When we analyze them, we get a recurrence relation for time complexity. We
get running time on an input of size n as a function of n and the running time on
inputs of smaller sizes. For example in Merge Sort, to sort a given array, we divide it
in two halves and recursively repeat the process for the two halves. Finally we merge
the results. Time complexity of Merge Sort can be written as T(n) = 2T(n/2) + cn.
There are many other algorithms like Binary Search, Tower of Hanoi, etc.
There are mainly three ways for solving recurrences.
1) Substitution Method: We make a guess for the solution and then we use
mathematical induction to prove the the guess is correct or incorrect.
For example consider the recurrence T(n) = 2T(n/2) + n
We need to prove that T(n) <= cnLogn. We can assume that it is true
for values smaller than n.
T(n) = 2T(n/2) + n
<= cn/2Log(n/2) + n
= cnLogn - cnLog2 + n
= cnLogn - cn + n
<= cnLogn
2) Recurrence Tree Method: In this method, we draw a recurrence tree and calculate
the time taken by every level of tree. Finally, we sum the work done at all levels. To
draw the recurrence tree, we start from the given recurrence and keep drawing till we
find a pattern among levels. The pattern is typically a arithmetic or geometric series.
For example consider the recurrence relation
T(n) = T(n/4) + T(n/2) + cn2
cn2
/ \
T(n/4) T(n/2)
cn2
/ \
c(n2)/16 c(n2)/4
/ \ / \
T(n/16) T(n/8) T(n/8) T(n/4)
Breaking down further gives us following
cn2
/ \
c(n2)/16 c(n2)/4
/ \ / \
c(n2)/256 c(n2)/64 c(n2)/64 c(n2)/16
/ \ / \ / \ / \
3) Master Method:
Master Method is a direct way to get the solution. The master method works only for
following type of recurrences or for recurrences that can be transformed to following
type.
T(n) = aT(n/b) + f(n) where a >= 1 and b > 1
There are following three cases:
1. If f(n) = Θ(nc) where c < Logba then T(n) = Θ(nLogba)
In recurrence tree method, we calculate total work done. If the work done at leaves is
polynomially more, then leaves are the dominant part, and our result becomes the
work done at leaves (Case 1). If work done at leaves and root is asymptotically same,
then our result becomes height multiplied by work done at any level (Case 2). If work
done at root is asymptotically more, then our result becomes work done at root (Case
3).
Examples of some standard algorithms whose time complexity can be evaluated
using Master Method
Merge Sort: T(n) = 2T(n/2) + Θ(n). It falls in case 2 as c is 1 and Logba] is also 1. So
the solution is Θ(n Logn)
Binary Search: T(n) = T(n/2) + Θ(1). It also falls in case 2 as c is 0 and Logba is also
0. So the solution is Θ(Logn)
Notes:
1) It is not necessary that a recurrence of the form T(n) = aT(n/b) + f(n) can be solved
using Master Theorem. The given three cases have some gaps between them. For
example, the recurrence T(n) = 2T(n/2) + n/Logn cannot be solved using master
method.
3) O(nc): Time complexity of nested loops is equal to the number of times the
innermost statement is executed. For example the following sample loops have O(n 2)
time complexity
for (int i = 1; i <=n; i += c) {
for (int j = 1; j <=n; j += c) {
// some O(1) expressions
}
}
For example Selection sort and Insertion Sort have O(n2) time complexity.
4) O(Logn) Time Complexity of a loop is considered as O(Logn) if the loop variables
is divided / multiplied by a constant amount.
for (int i = 1; i <=n; i *= c) {
// some O(1) expressions
}
for (int i = n; i > 0; i /= c) {
// some O(1) expressions
}
How to calculate time complexity when there are many if, else statements inside
loops?
As discussed here, worst case time complexity is the most useful among best, average
and worst. Therefore we need to consider worst case. We evaluate the situation when
values in if-else conditions cause maximum number of statements to be executed.
For example consider the linear search function where we consider the case when
element is present at the end or not present at all.
When the code is too complex to consider all if-else cases, we can get an upper bound
by ignoring if else and other complex control statements.
How to calculate time complexity of recursive functions?
Time complexity of a recursive function can be written as a mathematical recurrence
relation. To calculate time complexity, we must know how to solve recurrences. We
will soon be discussing recurrence solving techniques as a separate post.