Design & Analysis of Algorithms: Anwar Ghani
Design & Analysis of Algorithms: Anwar Ghani
Anwar Ghani
2
Reference Texts
The course material will be based on, or adapted from, the following sources:
3
Introduction
to
Algorithm Analysis
Introduction to Algorithm Analysis
Topics
• Computer Algorithms
• Application Domains
• Algorithm notation
• Algorithm Analysis
- time efficiency
- space efficiency
- correctness
• Case Study
5
Computer Algorithms
Definition
An algorithm is an orderly step-by-step procedure to solve a problem. A computer
algorithm has the following essential characteristics:
¾ The term algorithm is derived from the title Khowrizmi of ninth-century Persian
mathematician Abu Musa al-Khowrizmi , who is credited with systematic study and
development of important algebraic procedures.
6
Algorithm Applications
Problem Domains
A large variety of problems in computer science, mathematics and other disciplines depend
on the use of algorithms for their solutions. The broad categories of applications types are:
7
Algorithm Applications
Hard Problems
• There are several classic problems in computer science for which efficient algorithms
are not known . Such problems are referred to as hard or intractable.
• Currently, over a thousand such problems have been identified. Two celebrated
examples are Hamiltonian Problem and Traveling Salesperson Problem
Hamiltonian Problem
A Hamiltonian circuit , also called tour, is path in a graph (network) that
starts and ends at the same vertex, and passes through all other vertices
exactly once. The Hamiltonian problem is to find whether or not a given
graph has Hamiltonian circuit.
8
Algorithm Notation
Specifying Algorithm Steps
Conventions
For the purpose of design and analysis the algorithms are often specified in either of
the two ways:
10
Natural Language Specification
Example
This example illustrates the natural language description of an algorithm for preorder
traversal of a binary tree.
Step #2: Pop the stack. If stack is empty exit, else process the node
Step #3: Traverse down the tree following the left-most path, and pushing
each right child onto to the stack
¾ The phrases and words used in natural language are not formalized. The style and
choice of words vary with the algorithm design. However, all important steps and
branching points must be specified unambiguously
11
Pseudo Code
Notation
There is no standardized notation used for the pseudo code. The following notation, based on
T . Cormen et al, will be followed .
•Procedures: Name followed by input parameters enclosed in parentheses FIND-MAX (A, n)
• Assignments to variables: left arrows (←) e.g j← k ← p
• Comparisons of variables : symbols x ≤ y, x ≥ y, x ≠ y, x = y, x > y, x < y
• Logical expressions: connectives exp1 AND exp2 , exp1 OR exp2, NOT exp
• Computations: arithmetic symbols +, -, *, /
•Exchanges (Swapping) : symbol ↔ e.g A[i] ↔ A[k]
• Loops and iterations: Words for, do, repeat, while, until are used to describe loops, such as
for ---- do ----, for-----downto ---do---
while ---- do---
do ----- until---
• Conditionals: Words if and then are used to specify conditional statements, such as
if ---- then-----else-----
• Block structure: Using indentation ie an inner block is displaced with respect to the outer
block, as depicted below
do--- (start of outer most blob)
do---- (start of next block)
if--- else ---- ( start of a new block)
1 for j ← 2 to n
2 do key ← A[j]
3 ►Insert A[j] into sorted sequence A[1..j-1]
4 i ← j- 1
5 while i > 0 AND A[i] > key
6 do A[i+i] ← A[i]
7 i ← i-1
8 A[i+1] ← key
(i) Pseudo code for insertion sort
14
Algorithm Analysis
Objectives
• The purpose of algorithm analysis is, in general , to determine the performance of
the algorithm in terms of time taken and storage requirements to solve a given
problem.
• An other objective can be to check whether the the algorithm produces consistent,
reliable, and accurate outputs for all instances of the problem .It may also ensured
that algorithm is robust and would prove to be failsafe under all circumstances.
• The common metrics that are used to gauge the performance are referred to as time
efficiency, space efficiency, and correctness.
15
Algorithm Analysis
Input Size
• The performance is often expressed in terms problem size, or more precisely, by the
number of data item items processed by an algorithm.
• The key parameters used in the analysis of algorithms for some common application
types are:
16
Analysis of Algorithm
Time Efficiency
• The time efficiency determines how fast an algorithm solves a given problem.
Quantitatively, it is the measures of time taken by the algorithm to produce the output
with a given input. The time efficiency is denoted by a mathematical function of the input
size.
• Assuming that an algorithm takes T(n) time to process n data items. The function T(n) is
referred to as the running time of the algorithm. It is also known as time complexity. The
time complexity can depend on more than one parameter. For example, the running time of
a graph algorithm can depend both on the number of vertices and edges in the graph.
• The running time is an important indicator of the behavior of an algorithm for varying
inputs. It tells, for example, whether the time taken to process would increase in direct
proportional to input size, would increase four fold if size is doubled, or increase
exponentially.
• It would be seen that that time efficiency is the most significant metric for algorithm
analysis. For this reason, the main focus of algorithm analysis would be on determining
the time complexity.
17
Time Efficiency
Approaches
The time efficiency can be determined in several ways . The common methods are
categorized as empirical, analytical, and visualization
• The analytical method uses mathematical and statistical techniques to examine the
time efficiency. The running time is expressed as mathematical function of input size
18
Empirical Analysis
Methodology
The empirical methodology broadly consists of following steps
1) The algorithm is coded and run on a computer. The running times are measured, by
using some timer routine , with inputs of different sizes..
2) The output is logged and analyzed with the help of graphical and statistical tools to
determine the behavior and growth rate of running time in terms of input size.
3) A best curve is fitted to depict trend of the algorithm in terms input sizes
Empirical Analysis
Example
• The graph illustrates the results of an empirical analysis. The running times of a sorting
algorithm are plotted against the number input sort keys. The measured values are shown in
red dots. The graph shows the best-fit curve to scattered points.
Sorting Time
• The analysis indicates that time increases roughly in proportion to the square of input
size, which means that doubling the input size increases the running time four fold.
Empirical Analysis
Limitations
The empirical analysis provides estimates of running times in a real life situation. It has,
however, several limitations, because the running time crucially depends on several
factors. Some key factors that influence the time measurements are:
• Composition of data set sets ( Choice of data values and the range of input )
Analytical Analysis
Methodology
In analytic approach the running time is estimated by studying and analyzing the basic or
primitive operations involved in an algorithm. Broadly, the following methodology is
adopted:
• The number of times each basic operation is executed, for a given input,
is determined.
Best Case: In this case the algorithm has minimum running time.
: Tbest(n) = minimum(T1,T2,…Tk)
This is also called the optimistic time
Worst Case: In this case the algorithm has maximum running time
Tworst(n) = maximum(T1,T2,…Tk)
This is also known as pessimistic time
Average Case: The average running time is the average of running times for all
possible ordering of inputs of the same size:
Taverage(n) = (T1+T2+….+Tk) / k
Best, Worst, Average Cases
Example
• The graph shows best, worst, and average running times of an algorithm. In this example,
times are shown for 20 inputs of same size but in different order. The algorithm takes
minimum time to process Input #6 , and maximum time to process Input #17 , which are
referred to as best and worst times. The average of all the running times for 20 inputs is also
shown.
• The space requirement consists of the amount of real storage needed to execute the
algorithm code and the storage to hold the application data. The algorithm code occupies a
fixed amount of space, which is independent of the input size. The storage requirement for
the application depends on the nature of data structure used to provide faster and flexible
access to stored information . For an arrays and linked lists, for example, the space
requirement is directly proportional to the input size.
• For most algorithms, the space efficiency is not of much concern. However, some
algorithms require extra storage to hold results of intermediate computations. This is the
case, for example, with merge sort and dynamic programming techniques.
Algorithm Correctness
Loop Invariant Method
There are no standard methods for proving the correctness of a given algorithm. However,
many useful algorithms consist of one or more iterative computations, using loop
structures. The correctness of such algorithms can be formally established by a technique
called Loop Invariant Method .
A loop invariant is set of conditions and relationships that remain true prior to, during ,
and after the execution of a loop.
The loop invariant condition / statement depends on the nature of problem being analyzed
In a sorting problem, for example, the condition might be the order of keys in a sub-array,
which should remain in ascending/descending order ,prior to, and after the execution of each
iteration
Algorithm Correctness
Formal Proof
The loop invariant method establishes the correctness of algorithm in three steps, which
are known as initialization, maintenance, and termination ( Reference: T. Cormen et al )
At each step, the loop invariant is examined. If the loop conditions at each of the steps
hold true, then algorithm is said be correct.
Initialization: Loop invariant is true prior to execution of first iteration of the loop
Termination: After the termination of the loop, the invariant holds true for the problem
size
Algorithm Analysis
Visualization
Algorithm visualization techniques are used to study
• Problem statement
• Algorithm design
• Space complexity
• Correctness of algorithm
• Visualization of Analysis
Case Study
Problem Statement
¾ Design an algorithm to find maximum element in an array of size n
• Time efficiency
• Space efficiency
• Correctness
Case Study
Algorithm Design
The design features are expressed in plain language. The algorithm for the solution of
the problem consists of the following steps:
Step #3: Replace max with a larger element, when found during the scan
FIND-MAX ( A, n )
Best Case: Best case occurs when the statement is not executed at all. This happens when
the array maximum element occurs in the first cell. In this case k = 0, and best (minimum ) running time
is given by
Average Case: In this case, the statement is executed on an average n / 2 times so that k = n / 2. Thus,
average time running time is given by
Taverage(n) = A + (B / 2 + C).n
Worst Case: In this case, the statement is executed n-1 times; so k = n-1. This happens
when the array is sorted in ascending order .Thus, worst (maximum) running time is given by
FIND-MAX(A)
1 max ← A[1]
2 for j←2 to n do
3 if( A[j] > max)
4 then max ← A[j]
1 return max
Variable max holds the largest value at all stages of loop execution
• The Maintenance condition requires that if S is true before an iteration of loop, it should
remain true after the iteration It can be easily verified that if max holds the largest of k
elements, after kth iteration, then it holds largest of k+1 elements after the next iteration.
ie.(k+1)st iteration
• The Termination condition requires that post-condition should be true for problem size
i.e, max should return maximum array element. The loop terminates when index j exceeds
n. This implies that just after the last iteration max holds the largest of the first n elements
of the array. Since array has size n, it means that max returns the largest of array elements.
Analysis of Algorithm
Space Efficiency
The space analysis of algorithm for finding maximum element is simple and
straightforward. It amounts to determining space utilization as function of data structure size.
• The total space requirement consists of memory used by the program statements
and array element. The former is a fixed and does not depend on array size.
• The amount of storage requirement for the array depends on the nature of data
type (integer, floating point, strings). It increases in direct proportion to the array size
S(n) = A + B.n
Visualization
Visualization