Regular expression
Regular expression
REGULAR EXPRESSIONS
Introduction
• Instead of focusing on the power of a computing
device, let's look at the task that we need to
perform.
• Let's consider problems in which our goal is to match
finite or repeating patterns.
• Lexical analysis.-- compiler
• Filtering email for spam.
• Sorting email into appropriate mailboxes based on
sender and/or content words and phrases.
• Searching a complex directory structure by specifying
patterns that are known to occur in the file we want
REGULAR EXPRESSIONS
[^ \t\n] matches any character except space, tab and newline character.
• The regular expression language that we are about to
describe is built on an alphabet that contains two
kinds of symbols:
Strings of a’s and b’s of even length. (L = {w ϵ {a, b }*: |w| is even}
Regular expression = ((a , b) ( a , b))*
Strings of a’s and b’s of odd length (L = {w ϵ {a, b }*: |w| is odd}
Regular expression = (a, b)((a , b) ( a , b))*
Strings of a’s of even length
Regular expression = (aa)*
Strings of a’s of odd length
Regular expression = a(aa)*
L = {w ϵ {a, b }*: w contains an odd number of a’s}
or
b* (ab* ab*)* a b*
1. Kleene star
2. Concatenation, and
3. Union
(α U ε) → expression can be satisfied either by matching α or the empty string.
(a U b)* → Describes the set of all strings composed of the characters a and b.
a* U b* = (a U b)* Every string in the language on the left contains only a’s or b’s.
(ab)* ≠ a* b* The language on the left contains the string abab….. while the
language on the right does not. The language on the right
contains the string aaabbbb, while the language on the left does
not.
b. YES
c. NO
d. YES
Kleene's Theorem
= L(β ) U L(γ).
Concatenation Operation:
FSM for a
FSM for ab
FSM for (b U ab )
FSM for (b U ab)*
Convert the regular expression 0* + 1* + 2* to an ε- NFA or a
FSM
FSM to Regular Expression (State Elimination)
• How to build a regular expression for a FSM.
• Instead of limiting the labels on the transitions of an
FSM to a single character or ε, we will allow entire regular
expressions as labels.
• For a given input FSM M, we will construct a machine M’ such
that M and M’ are equivalent and M’ has only two states,
start state and a single accepting state.
• M’ will also have just one transition, which will go from its
start state to its accepting state.
• The label on that transition will be a regular expression that
describes all the strings that could have driven the original
machine M from its start state to some accepting state.
Consider the following FSM M:
OR
Obtain the regular expression for the above finite
automata using state elimination method.
• We can build an equivalent machine M' by
eliminating state q2 and replacing it by a transition
from q1 to q3 labeled with the regular expression
ab*a.
So M' is:
6.1. Select some state rip of M. Any state except the start
OR
Obtain the regular expression for the above finite automata
using state elimination method.
• Create a new start state and a new accepting state
and link them to M:
Remove state 3:
Remove state 2:
Remove state 1:
2. If the start state of M is part of a loop (i.e: it has any transitions coming
into it), then create a new start state s and connects to M ‘s start state via
an ε-transition. This new start state s will have no transitions into it.
3. If there is more than one accepting state of M or if there is just one but
there are any transitions out of it, create a new accepting state and
connect each of M’s accepting states to it via an ε-transition. Remove
the old accepting states from the set of accepting states. Note that the
new accepting state will have no transitions out from it.
• If there is more than one transition between states p and q,
collapse them into a single transition.
• If there is a pair of states p, q and there is no transition
between them and p is not the accepting state and q is not
the start state, then create a transition from p to q labeled Ø.
• At this point, if M has only one state, then that state is both the
start state and the accepting state and M has no transitions. So
L (M} = {ε}. Halt and return the simple regular expression as ε.
• If M has no accepting states then halt and return the simple
regular expression Ø.
• Until only the start state and the accepting state remain do:
• Text editors: which are some programs used for processing the text.
Example: UNIX text editor uses the RE for substituting the strings.
Example
1ɛ =ɛ1=1
ɛR = Rɛ=R
1Ø = Ø1 = Ø
ØR = RØ = Ø
ɛ* =ɛ
(Ø)* = ɛ
Ø +1 =1
Ø + R = R+Ø = R
1U1=1
R +R = R
00* = 0+
RR* =R*R = R+
(1*)* = 1*
(R*)* = R*
Example
R* R* = R*
ɛ + 1+ = 1 *
ɛ + RR* = R*
(P+Q)R = PR +QR
(P+Q)* =(P*Q*) = (P*+Q*)*
R*(ɛ + R) = (ɛ + R) R* = R*
(ɛ + R)* = R*
ɛ + R* = R*
(PQ)* P = P(QP)*
R*R + R = R*R =R+
Kleen’s Theorem
R220 = 0 +1 + ɛ
R120 = 0
R121 = R120 + R110 (R110)* R120
we know that Ø R = R
R122 = R121 + R121 (R221)* R221