Notes 05 Parallel String Matching
Notes 05 Parallel String Matching
String Matching
Sebastian Wild
2 March 2020
� This unit:
� How well can we parallelize string matching?
� What new ideas can help?
Here: string matching = find all occurrences of 𝑃 in 𝑇 (more natural problem for parallel)
always assume 𝑚 ≤ 𝑛
1
5.1 Elementary Tricks
Embarrassingly Parallel
� A problem is called “embarrassingly parallel”
if it can immediately be split into many, small subtasks
that can be solved completely independently of each other
� Typical example: sum of two large matrices (all entries independent)
� best case for parallel computation (simply assign each processor one subtask)
2
Clicker Question
B No
C Only for 𝑛 � 𝑚
D Only for 𝑛 ≈ 𝑚
pingo.upb.de/622222
3
Elementary parallel string matching
Subproblems in string matching:
� string matching = check all guesses 𝑖 = 0, . . . , 𝑛 − 𝑚 − 1
� checking one guess is a subtask!
4
Elementary parallel string matching
Subproblems in string matching:
� string matching = check all guesses 𝑖 = 0, . . . , 𝑛 − 𝑚 − 1
� checking one guess is a subtask!
Approach 1:
� Check all guesses in parallel
� Time: Θ(𝑚)
� Work: Θ((𝑛 − 𝑚)𝑚) � not great . . .
4
Elementary parallel string matching
Subproblems in string matching:
� string matching = check all guesses 𝑖 = 0, . . . , 𝑛 − 𝑚 − 1
� checking one guess is a subtask!
Approach 1:
� Check all guesses in parallel
� Time: Θ(𝑚)
� Work: Θ((𝑛 − 𝑚)𝑚) � not great . . .
Approach 2:
� Divide 𝑇 into overlapping blocks of 2𝑚 characters:
𝑇[0..2𝑚), 𝑇[𝑚..3𝑚), 𝑇[2𝑚..4𝑚), 𝑇[3𝑚..5𝑚). . .
� Find matches inside blocks in parallel, using efficient sequential method
� Θ(2𝑚 + 𝑚) = Θ(𝑚) each
� Time: Θ(𝑚) Work: Θ( 𝑚𝑛 · 𝑚) = Θ(𝑛)
4
Clicker Question
B No
C Only for 𝑛 � 𝑚
D Only for 𝑛 ≈ 𝑚
pingo.upb.de/622222
5
Clicker Question
B No
C Only for 𝑛 � 𝑚 �
D Only for 𝑛 ≈ 𝑚
pingo.upb.de/622222
5
Elementary parallel matching – Discussion
very simple methods
Goal:
� methods with better parallel time! � higher speedup
� must genuinely parallelize the matching process! (and the preprocessing of the pattern)
6
5.2 Periodicity
Periodicity of Strings
� 𝑆 = 𝑆[0..𝑛 − 1] has period 𝑝 iff ∀𝑖 ∈ [0..𝑛 − 𝑝) : 𝑆[𝑖] = 𝑆[𝑖 + 𝑝]
� 𝑝 = 0 and any 𝑝 ≥ 𝑛 are trivial periods but these are not very interesting . . .
Examples:
� 𝑆 = baaababaaab has period 6:
𝑆 b a a a b a b a a a b
=
=
=
=
=
𝑝=6
𝑆 b a a a b a b a a a b
𝑆 a b a a b a a b a a b a
=
=
=
=
=
=
=
=
𝑝=3 =
𝑆 a b a a b a a b a a b a
7
Periodicity and KMP
Lemma 5.1 (Periodicity = Longest Overlap)
𝑝 ∈ [1..𝑛] is the shortest period in 𝑆 = 𝑆[0..𝑛 − 1]
iff 𝑆[0..𝑛 − 𝑝) is the longest prefix that is also a suffix of 𝑆[𝑝..𝑛). �
8
Periodicity and KMP
Lemma 5.1 (Periodicity = Longest Overlap)
𝑝 ∈ [1..𝑛] is the shortest period in 𝑆 = 𝑆[0..𝑛 − 1]
iff 𝑆[0..𝑛 − 𝑝) is the longest prefix that is also a suffix of 𝑆[𝑝..𝑛). �
0 1 2 3 4 5 6 7 8 9 10 11
𝑆 a b a a b a a b a a b a
=
=
=
=
=
=
=
=
𝑝=3 =
𝑆 a b a a b a a b a a b a
fail[𝑛] = 9
8
Periodicity Lemma
Lemma 5.2 (Periodicity Lemma)
If string 𝑆 = 𝑆[0..𝑛 − 1] has periods 𝑝 and 𝑞 with 𝑝 + 𝑞 ≤ 𝑛,
then it has also period gcd(𝑝, 𝑞). �
greatest common divisor
9
Periodic strings
� What does the smallest period 𝑝 tell us about a string 𝑆[0..𝑛 − 1]?
� Two distinct regimes:
1. 𝑆 is periodic: 𝑝 ≤ 𝑛2
More precisely: 𝑆 is totally determined by a string 𝐹 = 𝐹[0..𝑝 − 1] = 𝑆[0..𝑝 − 1]
𝑆 keeps repeating 𝐹 until 𝑛 characters are filled
� 𝑆 is highly repetitive!
10
Clicker Question
A Yes
B No
pingo.upb.de/622222
11
Clicker Question
A Yes
B No �
� “looking repetitive” is not enough for periodic!
pingo.upb.de/622222
11
5.3 String Matching by Duels
Periods and Matching
Witnesses for non-periodicity:
� Assume, 𝑃[0..𝑚 − 1] does not have period 𝑝
12
Periods and Matching
Witnesses for non-periodicity:
� Assume, 𝑃[0..𝑚 − 1] does not have period 𝑝
12
Periods and Matching
Witnesses for non-periodicity:
� Assume, 𝑃[0..𝑚 − 1] does not have period 𝑝
12
String Matching by Duels – Sequential
Assume that pattern 𝑃 is aperiodic. (can deal with periodic case separately; details omitted)
Algorithm:
1. Set 𝜇 := � 𝑚2 �
13
String Matching by Duels – Sequential
Assume that pattern 𝑃 is aperiodic. (can deal with periodic case separately; details omitted)
Algorithm: Analysis:
1. Set 𝜇 := � 𝑚2 � 1. 𝑂(1)
13
String Matching by Duels – Parallel
Assume that pattern 𝑃 is aperiodic. (can deal with periodic case separately; details omitted)
Algorithm:
1. Set 𝜇 := � 𝑚
2�
14
String Matching by Duels – Parallel
Assume that pattern 𝑃 is aperiodic. (can deal with periodic case separately; details omitted)
3. For each block of 𝜇 consecutive indices [0..𝜇), [𝜇..2𝜇), [2𝜇..3𝜇), . . . 3. blocks in parallel (indep.),
run 𝜇 − 1 duels to eliminate all but one guesses in the block tournament of �lg 𝜇� rounds
14
String Matching by Duels – Parallel
Assume that pattern 𝑃 is aperiodic. (can deal with periodic case separately; details omitted)
3. For each block of 𝜇 consecutive indices [0..𝜇), [𝜇..2𝜇), [2𝜇..3𝜇), . . . 3. blocks in parallel (indep.),
run 𝜇 − 1 duels to eliminate all but one guesses in the block tournament of �lg 𝜇� rounds
� Matching part can be done in 𝑂(log 𝑚) parallel time and 𝑂(𝑛) work!
14
Computing witnesses
It remains to find the witnesses 𝜔[1..𝜇].
sequentially:
� an elementary procedure is similar in spirit to KMP failure array
parallel:
� much more complicated � beyond scope of the module
� first 𝑂(log2 (𝑚)) time on CREW-RAM
� later 𝑂(log 𝑚) time and 𝑂(𝑚) work using pseudoperiod method
15
Parallel Matching – State of the art
� 𝑂(log 𝑚) time & work-efficient parallel string matching
� this is optimal for CREW-PRAM
16