Design and Analysis of Algorithm
Disjoint Set (Union-Find Algorithm)
Two sets are called disjoint sets if they don’t have any element in common, the
intersection of sets is a null set.
A data structure that stores non overlapping or disjoint subset of elements is
called disjoint set data structure. The disjoint set data structure supports
following operations:
Adding new sets to the disjoint set.
Merging disjoint sets to a single disjoint set using Union operation.
Finding representative of a disjoint set using Find operation.
Check if two sets are disjoint or not.
Operations on Disjoint Set Data Structures:
1. Find
2. Union
1. Find:
Can be implemented by recursively traversing the parent array until we hit a node
that is the parent of itself.
Time complexity: This approach is inefficient and can take O(n) time in worst
case.
2. Union:
It takes two elements as input and finds the representatives of their sets using
the Find operation, and finally puts either one of the trees (representing the set)
under the root node of the other tree.
Time complexity: This approach is inefficient and could lead to tree of length O(n)
in worst case.
Design and Analysis of Algorithm
String Matching Introduction
String Matching Algorithm is also called "String Searching Algorithm." This is a
vital class of string algorithm is declared as "this is the method to find a place
where one is several strings are found within the larger string."
Applications of String Matching Algorithms:
Plagiarism Detection: The documents to be compared are decomposed into
string tokens and compared using string matching algorithms. Thus, these
algorithms are used to detect similarities between them and declare if the work is
plagiarized or original.
Bioinformatics and DNA Sequencing: Bioinformatics involves applying
information technology and computer science to problems involving genetic
sequences to find DNA patterns. String matching algorithms and DNA analysis are
both collectively used for finding the occurrence of the pattern set.
Algorithms of string Matching
Naive Algorithm: It slides the pattern over text one by one and check
for a match. If a match is found, then slides by 1 again to check for
subsequent matches.
Pros: •Very simple to understand and implement. • Works well when
the patterns are short and the text is not too large.
Cons: • Can be very slow, especially if the pattern occurs frequently
but with mismatches.
Rabin Karp Algorithm: It uses hashing to find any set of pattern
occurrences. Instead of checking all characters of the pattern at
every position (like the naive algorithm), it checks a hash value.
• Think of it like looking for a specific page in a book by its unique
code instead of by reading every word. If the page number (hash)
matches, then you check to make sure it's really the page you're
looking for.
Design and Analysis of Algorithm
Pros: • Faster than the naive approach on average. • Very efficient
for multiple pattern searches at once.
Cons: • Requires a good hash function to avoid frequent spurious
hits. • The worst-case time complexity can be as bad as the Naive
algorithm if many hash collisions occur.
KMP (Knuth Morris Pratt) Algorithm:
The KMP algorithm is smarter. It pre-processes the pattern to
understand its structure and eliminates unnecessary comparisons
when a mismatch occurs.
Pros: • The algorithm ensures that the characters of the text are
never compared more than once, which makes it very efficient. • No
backtracking is needed, so it's more efficient than the naive
approach.
Cons: • The pre-processing step requires additional time and
memory. • The algorithm is more complex to understand and
implement.
NP Complete Problem
NP Problem:
Design and Analysis of Algorithm
The NP problems set of problems whose solutions are hard to find but easy to
verify and are solved by Non-Deterministic Machine in polynomial time.
NP-Hard Problem:
A Problem X is NP-Hard if there is an NP-Complete problem Y, such that Y is
reducible to X in polynomial time. NP-Hard problems are as hard as NP-Complete
problems. NP-Hard Problem need not be in NP class.
If every problem of NP can be polynomial time reduced to it called as NP Hard.
A lot of times takes the particular problem solve and reducing different
problems.
example :
1. Hamiltonian cycle .
2. optimization problem .
3. Shortest path
NP-Complete Problem:
A problem X is NP-Complete if there is an NP problem Y, such that Y is reducible
to X in polynomial time. NP-Complete problems are as hard as NP problems. A
problem is NP-Complete if it is a part of both NP and NP-Hard Problem. A non-
deterministic Turing machine can solve NP-Complete problem in polynomial
time.
A problem is np-complete when it is both np and np hard combines together.
this means np complete problems can be verified in polynomial time.
Design and Analysis of Algorithm
Difference between NP-Hard and NP-Complete:
NP-hard NP-Complete
NP-Hard problems(say X) can be solved if NP-Complete problems can be
and only if there is a NP-Complete solved by a non-deterministic
problem(say Y) that can be reducible into Algorithm/Turing Machine in
X in polynomial time. polynomial time.
To solve this problem, it do not have to To solve this problem, it must be
be in NP . both NP and NP-hard problems.
Time is known as it is fixed in NP-
Time is unknown in NP-Hard.
Hard.
NP-Complete is exclusively a decision
NP-hard is not a decision problem.
problem.
Not all NP-hard problems are NP- All NP-complete problems are NP-
complete. hard
Do not have to be a Decision problem. It is exclusively a Decision problem.
It is optimization problem used. It is Decision problem used.
Approximation Algorithm
Design and Analysis of Algorithm
An Approximate Algorithm is a way of approach NP-COMPLETENESS for the
optimization problem. This technique does not guarantee the best solution. The
goal of an approximation algorithm is to come as close as possible to the
optimum value in a reasonable amount of time which is at the most polynomial
time. Such algorithms are called approximation algorithm or heuristic algorithm.
Performance Ratios
The main idea behind calculating the performance ratio of an approximation
algorithm, which is also called as an approximation ratio, is to find how close the
approximate solution is to the optimal solution.
The approximate ratio is represented using ρ(n) where n is the input size of the
algorithm, C is the near-optimal solution obtained by the algorithm, C* is the
optimal solution for the problem. The algorithm has an approximate ratio of ρ(n)
if and only if −
max{CC∗,C∗C}≤ρ(n)
Few popular examples of the approximation algorithms are −
Vertex Cover Algorithm
Set Cover Problem
Travelling Salesman Problem (Approximation Approach)
The Subset Sum Problem
Design and Analysis of Algorithm