0% found this document useful (0 votes)
35 views27 pages

Unit 4

The document discusses string matching algorithms, particularly focusing on the naive string-matching algorithm and the Rabin-Karp algorithm, which utilizes hash values for efficient pattern searching. It also covers the classification of problems into P and NP categories, explaining the characteristics of NP complete and NP hard problems. Applications of these algorithms in fields like bioinformatics and the complexity of solving such problems are highlighted.

Uploaded by

saiscount01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views27 pages

Unit 4

The document discusses string matching algorithms, particularly focusing on the naive string-matching algorithm and the Rabin-Karp algorithm, which utilizes hash values for efficient pattern searching. It also covers the classification of problems into P and NP categories, explaining the characteristics of NP complete and NP hard problems. Applications of these algorithms in fields like bioinformatics and the complexity of solving such problems are highlighted.

Uploaded by

saiscount01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT 4

~Ravi Sheth
String matching
Introduction
• Finding all occurrences of a pattern in a text is a
problem that arises frequently in text-editing
programs.
• Typically, the text is a document being edited,
and the pattern searched for is a particular word
supplied by the user.
• Efficient algorithms for this problem can greatly
aid the responsiveness of the text-editing
program. String-matching algorithms are also
used, for example, to search for particular
patterns in DNA sequences.
• We formalize the string-matching problem as
follows. We assume that the text is an array T[1 .
. n] of length n and that the pattern is an
array P[1 . . m] of length m.
• We further assume that the elements of P
and T are characters drawn from a finite
alphabet . For example, we may have = {0, 1}
or = {a, b, . . . , z}. The character
arrays P and T are often called strings of
characters.
Naive string-matching algorithm
• NAIVE-STRING-MATCHER(T, P)
• 1 n length[T]
• 2 m length[P]
• 3 for s 0 to n - m
• 4 do if P[1 . . m] = T[s + 1 . . s + m]
• 5 then print "Pattern occurs with shift" s
• The naive string-matching procedure can be interpreted
graphically as sliding a "template" containing the pattern
over the text, noting for which shifts all of the characters on
the template equal the corresponding characters in the
text, as illustrated in Figure.

• The for loop beginning on line 3 considers each possible


shift explicitly. The test on line 4 determines whether the
current shift is valid or not; this test involves an implicit
loop to check corresponding character positions until all
positions match successfully or a mismatch is found.

• Line 5 prints out each valid shift s.


Exercises
• Show the comparisons the naive string
matcher makes for the pattern P = 0001 in the
text T = 000010001010001.
Definition of Rabin-Karp
• A string search algorithm which compares a
string's hash values, rather than the strings
themselves. For efficiency, the hash value of
the next position in the text is easily
computed from the hash value of the current
position.
How Rabin-Karp works
• Let characters in both arrays T and P be digits in
radix-S notation. (S = (0,1,...,9)
• Let p be the value of the characters in P
• Choose a prime number q such that fits within a
computer word to speed computations.
• Compute (p mod q)
– The value of p mod q is what we will be using to find all
matches of the pattern P in T.
How Rabin-Karp works (continued)
• Compute (T[s+1, .., s+m] mod q) for s = 0 .. n-
m
• Test against P only those sequences in T
having the same (mod q) value
• (T[s+1, .., s+m] mod q) can be incrementally
computed by subtracting the high-order digit,
shifting, adding the low-order bit, all in
modulo q arithmetic.
A Rabin-Karp example
• Given T = 31415926535 and P = 26
• We choose q = 11
• P mod q = 26 mod 11 = 4

3 1 4 1 5 9 2 6 5 3 5
31 mod 11 = 9 not equal to 4

3 1 4 1 5 9 2 6 5 3 5

14 mod 11 = 3 not equal to 4

3 1 4 1 5 9 2 6 5 3 5

41 mod 11 = 8 not equal to 4


Rabin-Karp example continued
3 1 4 1 5 9 2 6 5 3 5
15 mod 11 = 4 equal to 4 -> spurious hit

3 1 4 1 5 9 2 6 5 3 5
59 mod 11 = 4 equal to 4 -> spurious hit

3 1 4 1 5 9 2 6 5 3 5
92 mod 11 = 4 equal to 4 -> spurious hit

3 1 4 1 5 9 2 6 5 3 5

26 mod 11 = 4 equal to 4 -> an exact match!!


3 1 4 1 5 9 2 6 5 3 5
65 mod 11 = 10 not equal to 4
Rabin-Karp example continued
3 1 4 1 5 9 2 6 5 3 5
53 mod 11 = 9 not equal to 4

3 1 4 1 5 9 2 6 5 3 5
35 mod 11 = 2 not equal to 4

As we can see, when a match is found, further testing is


done to insure that a match has indeed been found.
Complexity
• The running time of the Rabin-Karp algorithm in the
worst-case scenario is O(n-m+1)m but it has a good
average-case running time.
• If the expected number of valid shifts is small O(1)
and the prime q is chosen to be quite large, then the
Rabin-Karp algorithm can be expected to run in time
O(n+m) plus the time to required to process spurious
hits.
Applications
• Bioinformatics
– Used in looking for similarities of two or more proteins;
i.e. high sequence similarity usually implies significant
structural or functional similarity.

Example:
Hb A_human
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL
G+ +VK+HGKKV A++++++AH+ D++ ++ +++LS+LH KL
Hb B_human
GNPKVKAHGKKVLGAFSDGLAH LDNLKGTF ATLSELH CDKL
+ similar amino acids
The Class P and NP problem
• P problem
• NP problem
P - Problem
• P problems are set of problems that can be
solved in polynomial time by algorithm
• P problems are simple to solve and easy to
verify.
• Most of the problems we have discussed so
far are P problem
• It excludes all the problems which cannot be
solved in polynomial time.
• Examples of P problems are searching an
element in array, inserting element, sorting
data, finding height of tree.
NP- Problem
• NP problems are such problems that can be
solved in polynomial time
• NP stands for non deterministic polynomial
time.
• Nondeterministic stage: guessing
• Deterministic stage: verification
• Solution to NP problems cannot be obtained
in polynomial time, but given the solution, it
can be verified in polynomial time
NP- Problem
• NP includes all problem of P, i.e. P subset of
NP
• Examples
– Knapsack problem
– Travelling salesman problem
• NP problems are classified in NP complete and
NP hard categories.
Problem

P Class NP Class

NP
NP hard
Complete
NP Complete problem
• Decision problem p is called NP complete if it
has following properties:

– It belongs to class NP
– Every other problem in NP can be transferred to P
in polynomial time
NP complete
• Solution of any NP complete problem can be
verified in polynomial time, but cannot be
obtained in polynomial time
• NP complete problems are often solved using
randomization algorithms, heuristic approach
and approximation algorithms
• Examples
– Knapsack problem
– Travelling sales man problem
NP Hard
• NP hard problem are at least as hard as the
hardest in NP.
• NP hard problems might not be a decision
problem
• NP hard problems may not be in NP
Example
• Halting problem:
– “given an algorithm and set of inputs, will it run
forever?”
– Answer to this question is YES or NO
– There does not exist any known algorithm which
can decide the answer for any given input in
polynomial time.
– So it is consider as a NP hard problem
Conclusion

You might also like