Steps:
Preprocess:
Calculate the hash value of the pattern P.
Calculate the hash value of the first substring of T of length m.
Sliding Window:
Slide the pattern over the text one character at a time.
For each window, calculate the hash of the new substring in constant time using a
rolling hash.
Match Check:
If hash values match, then compare the strings character by character to confirm
the match.
If they don’t match, skip to the next window.Sure! Here's a **detailed
explanation** of the **Rabin-Karp String Matching Algorithm** — perfect for exam
prep or understanding it thoroughly:
---
## 🧠 **Rabin-Karp String Matching Algorithm – Detailed Explanation**
---
### ✅ **Purpose:**
The **Rabin-Karp algorithm** is used to **find occurrences of a pattern string `P`
in a larger text string `T`** efficiently, especially useful when multiple patterns
are involved.
---
### 🔍 **Key Idea:**
It uses **hashing** to **compare substrings of text** with the pattern, instead of
comparing characters one by one.
- If the hash of the substring matches the hash of the pattern, then do an exact
match check.
- If the hash doesn't match, skip checking the characters.
This greatly reduces the number of comparisons in practice.
---
### 🔧 **How It Works:**
Let:
- `n` = length of text `T`
- `m` = length of pattern `P`
- `q` = a large prime number used for modulo (to reduce hash collisions)
- `d` = number of characters in the alphabet (e.g., 256 for ASCII)
---
### 🔁 **Steps:**
1. **Preprocess**:
- Calculate the **hash value of the pattern `P`**.
- Calculate the **hash value of the first substring of `T` of length `m`**.
2. **Sliding Window**:
- Slide the pattern over the text one character at a time.
- For each window, calculate the **hash of the new substring** in constant time
using a **rolling hash**.
3. **Match Check**:
- If hash values **match**, then compare the strings character by character to
**confirm the match**.
- If they don’t match, skip to the next window.
---
### 🧮 **Hash Function:**
The hash function treats the substring as a number in base `d`.
For a string `S[0..m-1]`, hash value is:
```
hash(S) = (d⁰*S[0] + d¹*S[1] + ... + dᵐ⁻¹*S[m-1]) % q
```
The rolling hash formula allows updating the hash in **O(1)** time as the window
slides.
---
### **Time Complexity:**
- **Best/Average Case:** `O(n + m)`
- **Worst Case:** `O(n * m)` (due to hash collisions)
---
### ✅ **Advantages:**
- Very efficient for **multiple pattern matching**.
- Hashing avoids character-by-character comparison in most cases.
---
### ❌ **Disadvantages:**
- Can suffer from **hash collisions**, leading to unnecessary comparisons.
- Needs a good hash function and a large prime `q`.
---
### 📌 **Example:**
Text: `T = "ABCCDDAEFG"`
Pattern: `P = "CDD"`
1. Compute hash of `"CDD"`
2. Slide through substrings of length 3 in text:
- `"ABC"` → hash doesn’t match
- `"BCC"` → no match
- `"CCD"` → no match
- `"CDD"` → hash match → compare → ✅ match found!
---
### ✨ Summary Table:
| Feature | Rabin-Karp Algorithm |
|--------------------|----------------------------------------|
| Uses | Pattern matching in strings |
| Technique | Hashing + Sliding Window |
| Time Complexity | Best: O(n + m), Worst: O(n * m) |
| Suitable For | Large texts, Multiple pattern matching |
| Checks | First hash, then actual characters |
---
Let me know if you want a simple C/C++ or Python implementation too!