Data Structure Part 3
Data Structure Part 3
Algorithm
writing is a process and is executed after the problem domain is well-defined. That is,
we should know the problem domain, for which we are designing a solution.
Example
Let's try to learn algorithm-writing by using an example.
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Algorithms tell the programmers how to code the program. Alternatively, the algorithm
can be written as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe an
algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all unwanted
definitions. He can observe what operations are being used and how the process is
flowing.
Writing step numbers, is optional.
We design an algorithm to get a solution of a given problem. A problem can be solved
in more than one ways.
Hence, many solution algorithms can be derived for a given problem. The next step is to
analyze those proposed solution algorithms and implement the best suitable solution.
1.8 ALGORITHM COMPLEXITY
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such
as comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space
required by the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space
required by the algorithm in terms of n as the size of input data.
1.8.1 Space Complexity
Space complexity of an algorithm represents the amount of memory space required by
the algorithm in its life cycle. The space required by an algorithm is equal to the sum of
the following two components −
A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and
constants used, program size, etc.
A variable part is a space required by variables, whose size depends on the size of
the problem. For example, dynamic memory allocation, recursion stack space,
etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part
and S(I) is the variable part of the algorithm, which depends on instance characteristic I.
Following is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now,
space depends on data types of given variables and constant types and it will be
multiplied accordingly.
Algorithm analysis framework involves finding out the time taken and the memory space
required by a program to execute the program. It also determines how the input size of a
program influences the running time of the program.
The efficiency of some algorithms may vary for inputs of the same size. For
such algorithms, we need to differentiate between the worst case, average case
and best case efficiencies.
If an algorithm takes maximum amount of time to execute for a specific set of input, then
it is called the worst case time complexity. The worst case efficiency of an algorithm is
the efficiency for the worst case input of size n. The algorithm runs the longest among all
the possible inputs of the similar size because of this input of size n.
Algorithms are widely used in various areas of study. We can solve different problems
using the same algorithm. Therefore, all algorithms must follow a standard. The
mathematical notations use symbols or symbolic expressions, which have a precise
semantic meaning.
A problem may have various algorithmic solutions. In order to choose the best algorithm
for a particular process, you must be able to judge the time taken to run a particular
solution. More accurately, you must be able to judge the time taken to run two solutions,
and choose the better among the two.
To select the best algorithm, it is necessary to check the efficiency of each algorithm. The
efficiency of each algorithm can be checked by computing its time complexity. The
asymptotic notations help to represent the time complexity in a shorthand way. It can
generally be represented as the fastest possible, slowest possible or average possible.
The notations such as O (Big-O), Ώ (Omega), and θ (Theta) are called as asymptotic
notations. These are the mathematical notations that are used in three different cases of
time complexity.
‘O’ is the representation for Big-O notation. Big -O is the method used to express the
upper bound of the running time of an algorithm. It is used to describe the performance or
time complexity of the algorithm. Big-O specifically describes the worst-case scenario
and can be used to describe the execution time required or the space used by the
algorithm.
Table 2.1 gives some names and examples of the common orders used to
describe functions. These orders are ranked from top to bottom.
f(n) ≤ c ∗ g(n)
where, n can be any number of inputs or outputs and f(n) as well as g(n) are
two non-negative functions. These functions are true only if there is a constant
c and a non-negative integer n0 such that,
n ≥ n0.
The Big-O can also be denoted as f(n) = O(g(n)), where f(n) and g(n) are two
non -negative functions and f(n) < g(n) if g(n) is multiple of some constant c.
The graphical representation of f(n) = O(g(n)) is shown in figure 2.1, where
the running time increases considerably when n increases.
Example: Consider f(n)=15n3+40n2+2nlog n+2n. As the value of n
increases, n3 becomes much larger than n2, nlog n, and n. Hence, it
dominates the function f(n) and we can consider the running time to grow by
the order of n3. Therefore, it can be written as f(n)=O(n3).
The values of n for f(n) and C* g(n) will not be less than n0. Therefore, the
values less than n0 are not considered relevant.
Example:
Consider function f(n) = 2(n)+2 and g(n) = n2.
g(n) = n2 = 12 = 1
Here, f(n)>g(n)
Let n = 2, then
f(n) = 2(n)+2 = 2(2)+2 = 6
g(n) = n2 = 22 = 4
Here, f(n)>g(n)
Let n = 3, then
f(n) = 2(n)+2 = 2(3)+2 = 8
g(n) = n2 = 32 = 9
Here, f(n)<g(n)
Thus, when n is greater than 2, we get f(n)<g(n). In other words, as n becomes larger,
the running time increases considerably. This concludes that the Big-O helps to
determine the ‘upper bound’ of the algorithm’s run-time.
‘Ω’ is the representation for Omega notation. Omega describes the manner in which an
algorithm performs in the best case time complexity. This notation provides the minimum
amount of time taken by an algorithm to compute a problem. Thus, it is considered that
omega gives the "lower bound" of the algorithm's run-time. Omega is defined as:
f(n) ≥ c ∗ g(n)
Where, n is any number of inputs or outputs and f(n) and g(n) are two non-negative
functions. These functions are true only if there is a constant c and a non-negative integer
n0 such that n>n0.
Example:
Consider function f(n) = 2n2+5 and g(n) = 7n.
We need to find the constant c such that f(n) ≥ c ∗ g(n).
Let n = 0, then
f(n) = 2n2+5 = 2(0)2+5 = 5
g(n) = 7(n) = 7(0) = 0
Here, f(n)>g(n)
Let n = 1, then
Thus, for n=1, we get f(n) ≥ c ∗ g(n). This concludes that Omega helps to
determine the "lower bound" of the algorithm's run-time.
'θ' is the representation for Theta notation. Theta notation is used when the
upper bound and lower bound of an algorithm are in the same order of
magnitude. Theta can be defined as:
c1 ∗ g(n) ≤ f(n) ≤ c2 ∗ g(n) for all n>n0
Where, n is any number of inputs or outputs and f(n) and g(n) are two non-
negative functions. These functions are true only if there are two constants
namely, c1, c2, and a non-negative integer n0.
Example: Consider function f(n) = 4n + 3 and g(n) = 4n for all n ≥ 3; and f(n) = 4n + 3
and g(n) = 5n for all n ≥ 3.
f(n) = 4n + 3 = 4(3)+3 = 15
g(n) = 5n =5(3) = 15 and
here, c1 is 4, c2 is 5 and n0 is 3
Thus, from the above equation we get c1 g(n) f(n) c2 g(n). This concludes that Theta
notation depicts the running time between the upper bound and lower bound.
Introduction
Divide and Conquer approach basically works on breaking the problem into sub problems
that are similar to the original problem but smaller in size & simpler to solve. once
divided sub problems are solved recursively and then combine solutions of sub problems
to create a solution to original problem.
At each level of the recursion the divide and conquer approach follows three steps:
Divide: In this step whole problem is divided into several sub problems.
Conquer: The sub problems are conquered by solving them recursively, only if they are
small enough to be solved, otherwise step1 is executed.
Combine: In this final step, the solution obtained by the sub problems are combined to
create solution to the original problem.
Generally,
we can
follow
the divide-
and-
Examples: The specific computer algorithms are based on the Divide & Conquer
approach:
1. Relational Formula
2. Stopping Condition
1. Relational Formula: It is the formula that we generate from the given technique.
After generation of Formula, we apply D&C Strategy, i.e., we break the problem
recursively & solve the broken subproblems.
2. Stopping Condition: When we break the problem using Divide & Conquer Strategy,
then we need to know that for how much time, we need to apply divide & Conquer. So,
the condition where the need to stop our recursion steps of D&C is called as Stopping
Condition.
Following algorithms are based on the concept of the Divide and Conquer Technique:
1.11.2 Backtracking
Introduction
o Each non-leaf node in a tree is a parent of one or more other nodes (its children)
o Each node in the tree, other than the root, has exactly one parent
Generally, however, we draw our trees downward, with the root at the top.
To "explore" node N:
1. If N is a goal node, return "success"
2. If N is a leaf node, return "failure"
3. For each child C of N,
Explore C
If C was successful, return "success"
4. Return "failure"
Explicit Constraint is ruled, which restrict each vector element to be chosen from the
given set.
Implicit Constraint is ruled, which determine which each of the tuples in the solution
space, actually satisfy the criterion function.
1.11.3 Dynamic programming
Now for any problem to be solved through dynamic programming approach it must
follow the following conditions:
Principle of Optimality: It states that for solving the master problem optimally, its sub
problems should be solved optimally. It should be noted that not all the times each sub
problem(s) is solved optimally, so in that case we should go for optimal majority.
Polynomial Breakup: For solving the main problem, the problem is divided into several
sub problems and for efficient performance of dynamic programming the total number of
sub problems to be solved should be at-most a polynomial number.
Various algorithms which make use of Dynamic programming technique are as follows:
1. Knapsack problem.
2. Chain matrix multiplication.
3. All pair shortest path.
4. Travelling sales man problem.
5. Tower of hanoi.
6. Checker Board.
7. Fibonacci Sequence.
8. Assembly line scheduling.
9. Optimal binary search trees.
1.12 SUMMARY
A data structure is a particular way of storing and organizing data either in computer’s
memory or on the disk storage so that it can be used efficiently.
There are two types of data structures: primitive and non-primitive data structures.
Primitive data structures are the fundamental data types which are supported by a
programming language. Nonprimitive data structures are those data structures which are
created using primitive data structures.
Non-primitive data structures can further be classified into two categories: linear and
non-linear data structures.
If the elements of a data structure are stored in a linear or sequential order, then it is a
linear data structure. However, if the elements of a data structure are not stored in
sequential order, then it is a non-linear data structure.
An array is a collection of similar data elements which are stored in consecutive memory
locations.
A linked list is a linear data structure consisting of a group of elements (called nodes)
which together represent a sequence.
A stack is a last-in, first-out (LIFO) data structure in which insertion and deletion of
elements are done at only one end, which is known as the top of the stack.
A queue is a first-in, first-out (FIFO) data structure in which the element that is inserted
first is the first to be taken out. The elements in a queue are added at one end called the
rear and removed from the other end called the front.
A tree is a non-linear data structure which consists of a collection of nodes arranged in a
hierarchical tree structure.
A graph is often viewed as a generalization of the tree structure, where instead of a purely
parent-to-child relationship between tree nodes, any kind of complex relationships can
exist between the nodes.
An abstract data type (ADT) is the way we look at a data structure, focusing on what it
does and ignoring how it does its job.
An algorithm is basically a set of instructions that solve a problem.
The time complexity of an algorithm is basically the running time of the program as a
function of the input size.
The space complexity of an algorithm is the amount of computer memory required during
the program execution as a function of the input size.
The worst-case running time of an algorithm is an upper bound on the running time for
any input.
The average-case running time specifies the expected behaviour of the algorithm when
the input is randomly drawn from a given distribution.
The efficiency of an algorithm is expressed in terms of the number of elements that has to
be processed and the type of the loop that is being used.