Unit-3 Regular Expressions

Unit III: Regular Expressions
3.1. Regular Expressions, Operators of Regular Expressions (Union,

Concatenation, Kleen), Regular Languages and their applications, Algebraic Rules
for Regular Expressions
3.2. Equivalence of Regular Expression and Finite Automata, Reduction of Regular
Expression to ε–NFA, Conversion of DFA to Regular Expression, Arden’s Theorem
3.3. Properties of Regular Languages, Pumping Lemma for regular expression,
Application of Pumping Lemma, Closure Properties of Regular Languages over
(Union, Intersection , Complement), Minimization of Finite State Machines:
Table Filling Algorithm
Regular Languages
• A language L is regular if it is the language accepted by some DFA.
– A language is regular if it can be described by a regular expression.
• Some languages are not regular.
– If a language is not regular, there is no DFA for that language.
Example 1:
L1 = {0n1n | n ≥ 1} is not regular.
- The set of strings consisting of n 0’s followed by n 1’s, such that n is at least 1.
Thus, L1 = {01, 0011, 000111,…}
Example 2:
L2 = {w | w in {(, )}* and w is balanced }
- Balanced parentheses are those that can appear in an arithmetic expression.
L2 = { (), ()(), (()), (()()),… }
Regular Expression and Grammar::
REGULAR LANGUAGE=> Basic language + Regular operator
The basic language: The simple language is of the form {a} where a ε Ʃ and the
empty language ε.
Regular operator: There are three regular operators used NOTE: Precedence of
regular operator:
to generate a language which as mentioned below:-
The star operator is
1. Union (U): L1UL2={S|S ε L1 or S ε L2}
of highest
2. Concatenation (.): L1.L2={S.t|S ε L1 and t ε L2} precedence. i.e it
3. Kleene closure (*): L*= 0 or more applies to its left well
4. Positive closure (+): L+=1 or more formed RE.
Next precedence is
Example:- If L1={11,00},l2={01,10} over ε={0,1} then, taken by
L1UL2= {11, 00, 01, 10} concatenation
L1.L2= {1101, 1110, 0001, 0010} operator.
L*= {ε, 11, 00, 1111, 11011……………….} Finally, unions are
L+= {11, 00, 1111, 11011…………………….} taken.
Regular Sets
Any set that represents the value of the Regular Expression is called a Regular Set.
Properties of Regular Sets
Property 1. The union of two regular set is regular.

Proof −
Let us take two regular expressions
RE1 = a(aa)* and RE2 = (aa)*
So, L1 = {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
and L2 ={ ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∪ L2 = { ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,.......}
(Strings of all possible lengths including Null)
RE (L1 ∪ L2) = a* (which is a regular expression itself)
Hence, proved.
Property 2. The intersection of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null)
RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.
Hence, proved.
Property 3. The complement of a regular set is regular.
Proof −
Let us take a regular expression −
RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including Null)
Complement of L is all the strings that is not in L.
So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L’) = a(aa)* which is a regular expression itself.
Hence, proved.
Property 4. The difference of two regular set is regular.
Proof −
Let us take two regular expressions −
RE1 = a (a*) and RE2 = (aa)*
So, L1 = {a, aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....}
(Strings of all odd lengths excluding Null)
RE (L1 – L2) = a (aa)* which is a regular expression.
Hence, proved.
Property 5. The reversal of a regular set is regular.
Proof −
We have to prove LR is also regular if L is a regular set.
Let, L = {01, 10, 11, 10}
RE (L) = 01 + 10 + 11 + 10
LR = {10, 01, 11, 01}
RE (LR) = 01 + 10 + 11 + 10 which is regular
Hence, proved.
Property 6. The closure of a regular set is regular.
Proof −
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
i.e., RE (L) = a (aa)*
L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
RE (L*) = a (a)*
Hence, proved.
Property 7. The concatenation of two regular sets is regular.
Proof −
Let RE1 = (0+1)*0 and RE2 = 01(0+1)*
Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in 0)
and L2 = {01, 010,011,.....} (Set of strings beginning with 01)
Then, L1 L2 = {001,0010,0011,0001,00010,00011,1001,10010,.............}
Set of strings containing 001 as a substring which can be represented by an
RE − (0 + 1)*001(0 + 1)*
Hence, proved.
∑ = {a, b} and r is a regular expression of language made using these symbols
Regular language Regular set

∅ {}
∈ {∈}
a* {∈, a, aa, aaa …..}
a+ b {a, b}
a.b {ab}
a* + ba {∈, a, aa, aaa,…… , ba}

Regular expression
Regular expressions are used to denote regular languages. They can represent regular
languages and operations on them succinctly.
NOTE: A regular expression is not unique for a language.
The set of regular expressions over an alphabet is defined recursively. Any element of
that set is a regular expression.
Examples of regular expression and regular languages corresponding to them
• ( a + b )2 corresponds to the language {aa, ab, ba, bb}, that is the set of strings of length
2 over the alphabet {a, b}. In general ( a + b )k corresponds to the set of strings of length k
over the alphabet {a, b}.
• (a + b )* corresponds to the set of all strings over the alphabet {a, b}.
• a*b* corresponds to the set of strings consisting of zero or more a's followed by zero or
more b’s.
• a*b+a* corresponds to the set of strings consisting of zero or more a's followed
by one or more b's followed by zero or more a's.
• ( ab )+ corresponds to the language {ab, abab, ababab, ... }, that is, the set of
strings of repeated ab's.
Examples: Write a RE for the set of string that consists of alternating 0’s and 1’s
over {0,1}.
SOLUTION:
First part: we have to generate the language {01,0101,0101,…………………}
Second part we have to generate the language {10,1010,101010…………….}
So lets start first part.
Here we start with the basic regular expressions 0 and 1 that represent the
language {0} and {1} respectively.
Now if we concatenate these two RE, we get the RE 01 that represent the
language {01}.
Then to generate the language of zero or more occurrence of 01, we take Kleen
closure. i.e. the RE (01)+ represent the language {01,0101,…………..}
Similarly, the RE for second part is (10)+.
Now finally we take union of above two first part and second part to get the
required RE. i.e. the RE (01)++(10)+ OR (01)+|(10)+ represent the given language.
• Note: A regular expression is not unique for a language. That is, a
regular language, in general, corresponds to more than one regular
expression.
• For example (a + b )* and ( a*b* )* correspond to the set of all strings

over the alphabet {a, b}.
Definition of Equality of Regular Expressions
Regular expressions are equal if and only if they correspond to the same
language.
Thus for example (a + b )* = ( a*b* )* , because they both represent the

language of all strings over the alphabet {a, b}.
In general, it is not easy to see by inspection whether or not two regular

expressions are equal.
Examples and Exercises related to R.E
Ex. 1: Find the shortest string that is not in the language represented by the
regular expression a*(ab)*b*.
Solution: It can easily be seen that , a, b, which are strings in the language with
length 1 or less.
Of the strings with length 2 aa, bb and ab are in the language. However, ba is not
in it. Thus the answer is ba.
Ex. 2: Find a regular expression corresponding to the language of all strings over
the alphabet {a, b } that contain exactly two a's.
Solution: A string in this language must have at least two a's. Since any string of
b's can be placed in front of the first a, behind the second a and between the two
a's, and since an arbitrasry string of b's can be represented by the regular
expression b*, b*a b*a b* is a regular expression for this language.
Ex. 3: Let r1 and r2 be arbitrary regular expressions over some alphabet. Find a simple (the
shortest and with the smallest nesting of * and +) regular expression which is equal to each of
the following regular expressions.
(a) (r1 + r2 + r1r2 + r2r1)*
(b) (r1(r1 + r2)*)+
Solution: One general strategy to approach this type of question is to try to see whether or not
they are equal to simple regular expressions that are familiar to us such as a, a*, a+, (a + b)*, (a
+ b)+ etc.
(a) Since (r1 + r2)* represents all strings consisting of strings of r1 and/or r2 , r1r2 + r2r1 in the
given regular expression is redundant, that is, they do not produce any strings that are not
represented by (r1 + r2)*. Thus (r1 + r2 + r1r2 + r2r1)* is reduced to (r1 + r2)*.
(b) (r1(r1 + r2)*)+ means that all the strings represented by it must consist of one or more
strings of (r1(r1 + r2)*). However, the strings of (r1(r1 + r2)*) start with a string of r1 followed
by any number of strings taken arbitrarily from r1 and/or r2. Thus anything that comes after the
first r1 in (r1(r1 + r2)*)+ is represented by (r1 + r2)*. Hence (r1(r1 + r2)*) also represents the
strings of (r1(r1 + r2)*)+, and conversely (r1(r1 + r2)*)+ represents the strings represented by
(r1(r1 + r2)*). Hence (r1(r1 + r2)*)+ is reduced to (r1(r1 + r2)*).
Ex. 4: For the two regular expressions given below,
(a) find a string corresponding to r2 but not to r1 and
(b) find a string corresponding to both r1 and r2.
r1 = a* + b* r2 = ab* + ba* + b*a + (a*b)*
Solution:
(a) Any string consisting of only a's or only b's and the empty string are in
r1. So we need to find strings of r2 which contain at least one a and at
least one b. For example ab and ba are such strings.
(b) A string corresponding to r1 consists of only a's or only b's or the empty
string. The only strings corresponding to r2 which consist of only a's or b's
are a, b and the strings consiting of only b’s (from (a*b)*).
Ex. : Find a regular expression corresponding to the language of all strings over the alphabet {a, b } that do not
end with ab.
Solution:
Any string in a language over { a , b } must end in a or b.
Hence if a string does not end with ab then it ends with a or if it ends with b the last b must be preceded by a
symbol b.
Since it can have any string in front of the last a or bb, ( a + b )*( a + bb ) is a regular expression for the language.
(a+b)*(a+bb)
OR
(a|b)*(a|bb)
Ex.: Find a regular expression corresponding to the language of strings of even
lengths over the alphabet of { a, b }.
Solution:
Since any string of even length can be expressed as the concatenation of strings
of length 2 and since the strings of length 2 are aa, ab, ba, bb, a regular
expression corresponding to the language is ( aa + ab + ba + bb )*.
Note that 0 is an even number. Hence the string is in this language.
Ex.: Describe as simply as possible in English the language corresponding to the
regular expression a*b(a*ba*b)*a* .
Solution:
A string in the language can start and end with a or b, it has at least one b, and
after the first b all the b's in the string appear in pairs. Any number of a's can
appear any place in the string.
Thus simply put, it is the set of strings over the alphabet { a, b } that contain an
odd number of b's
Ex. : Describe as simply as possible in English the language corresponding to the
regular expression (( a + b )3)*( + a + b ) .
Solution:
(( a + b )3) represents the strings of length 3. Hence (( a + b )3)* represents the
strings of length a multiple of 3.
Since (( a + b )3)*( a + b ) represents the strings of length 3n + 1, where n is
a natural number, the given regular
Ex. : Describe as simply as possible in English the language corresponding to the
regular expression ( b + ab )*( a + ab )*.
Solution:
( b + ab )* represents strings which do not contain any substring aa and which
end in b, and ( a + ab )* represents strings which do not contain any substring
bb.
Hence altogether it represents any string consisting of a substring with no aa
followed by one b followed by a substring with no bb.
Some RE Examples
Regular Expressions Regular Set
(0 + 10*) L = { 0, 1, 10, 100, 1000, 10000, … }
(0*10*) L = {1, 01, 10, 010, 0010, …}
(0 + ε)(1 + ε) L = {ε, 0, 1, 01}
(a+b)* Set of strings of a’s and b’s of any length including the null string. So L =
{ ε, a, b, aa , ab , bb , ba, aaa…….}
(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb,
aabb, babb, aaabb, ababb, …………..}
(11)* Set consisting of even number of 1’s including empty string, So L= {ε,
11, 1111, 111111, ……….}
(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd number
of b’s , so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb, …………..}
(aa + ab + ba + bb)* String of a’s and b’s of even length can be obtained by concatenating
any combination of the strings aa, ab, ba and bb including null, so L =
{aa, ab, ba, bb, aaab, aaba, …………..}
∑ = {a, b} and r is a regular expression of language made using these symbols
Regular language Regular set

∅ {}
∈ {∈}
a* {∈, a, aa, aaa …..}
a+ b {a, b}
a.b {ab}
a* + ba {∈, a, aa, aaa,…… , ba}

Properties of Regular Expressions (R.E)
1. Commutative:
The union of Regular expression is commutative, let L and K are two languages
represented
by R.E L and R.
2. Associativity:
The union and concatenation operation of R.E are associative. Let L,R,S are RE’s
represented
of languages L,R and s then,
L+ (R+S) = (L+R) + S
L (RS) = (LR) S
3. Identity:
ɸ is the identity from union i.e. ɸ + R = R + ɸ =R (NOTE: ∅ = {} )
ε is the identity for concatenation εR=Rε=R (NOTE: ε = {ε} )
4. Annihilator:
An annihilator for an operation is a value such that when operator is applied with
that value and another value, the result of operation is annihilator.
In the case of a concatenation operator, R.X = X, when X = ∅, then R.∅ = ∅,
therefore ∅ is the annihilator for the (.)operator.
For example {a, aa, ab}.{ } = { }
5. Idempotent law:
If R is R.E then R+R=R
6. Law of closure:
If R is R.E the ((R)*)*=R*
ɸ= closure of ɸ= ɸ*= ɸ
ε=closure of ε= ε*= ε
7. Identities for regular expression
• There are many identities for the regular expression. Let p, q and r are regular
expressions.
∅+r=r
∅.r= r.∅ = ∅
∈.r = r.∈ =r
∈* = ∈ and ∅* = ∈
r+r=r
r*.r* = r*
r.r* = r*.r = r+.
(r*)* = r*
∈ +r.r* = r* = ∈ + r.r*
(p.q)*.p = p.(q.p)*
(p + q)* = (p*.q*)* = (p* + q*)*
(p+ q).r= p.r+ q.r and r.(p+q) = r.p + r.q
Examples
Consider Σ = {0, 1}, then some regular expressions over Σ are ;
• 0*10* is RE that represents language {w|w contains a single 1}
• Σ*1Σ* is RE for language{w|w contains at least single 1}
• Σ*001 Σ* = {w|w contains the string 001 as substring}
• (Σ Σ)* or ((0+1)*.(0+1)*) is RE for {w|w is string of even length}
• 1*(01*01*)* is RE for {w|w is string containing even number of zeros}
• 0*10*10*10* is RE for {w|w is a string with exactly three 1’s}
• For string that have substring either 001 or 100, the regular expression is
(1+0)*.001.(1+0)*+(1+0)*.(100).(1+0)*
• For strings that have at most two 0’s with in it, the regular expression is
1*.(0+Є).1*.(0+Є).1*
• For the strings ending with 11, the regular expression is
(1+0)*.(11)+
• Regular expression that denotes the C language identifiers:
(Alphabet + _ )(Alphabet + digit + _ )*
Application of regular languages:
Validation: Determining that a string complies with a set of formatting

constraints. Like email address validation, password validation etc.
Search and Selection: Identifying a subset of items from a larger set on the
basis of a pattern match.
Tokenization: Converting a sequence of characters into words, tokens (like

keywords, identifiers) for later interpretation.
Theorem1:
If L, M and N are any language then prove: L (M υ N) = LM υ LN
Proof:
Let w is a string such that w=xy we have to show that w ε L (M υ N) iff w ε LM υ
LN
If w ε LM υ LN then,w ε LM and w ε LN (by union rule).
xy ε LM then, x ε L or y ε M (by concatenation rule).
xy ε LN then, x ε L and y ε N (by concatenation rule).
Hence this implies;
xy ε L (M υ N)
i.e. w ε L (M υ N)
Proved.
onlyif (iff):
w ε L (M υ N) then, xy ε L(M υ N)
x ε L and y ε (M υ N) (by concatenation rule)
If y ε M then xy ε LM
If y ε N then xy ε LN
so, xy ε LM υ LN
Hence, w ε LM υ LN
Proved
Theorem 2
For any R.E r, there is an ε-NFA that accepts the same language represented by r.
Proof:-
FSA and RE
The regular expression approach for describing language is
fundamentally different from the finite automaton approach.
However, these two notations turn out to represent exactly the same
set of languages, which we call regular languages.
In order to show that the RE define the same class of language as

Finite automata, we must show that:
1)Any language define by one of these finite automata is also defined
by RE.
2)Every language defined by RE is also defined by any of these finite
automata.
We can proceed as:
Arden's Theorem
In order to find out a regular expression of a Finite Automaton, we use Arden’s Theorem along
with the properties of regular expressions.
Statement −
Let P and Q be two regular expressions.
If P does not contain null string, then R = Q + RP has a unique solution that is R = QP*
Proof −
R = Q + (Q + RP)P [After putting the value R = Q + RP]
= Q + QP + RPP
When we put the value of R recursively again and again, we get the following equation −
R = Q + QP + QP2 + QP3…..
R = Q (ε + P + P2 + P3 + …. )
R = QP* [As P* represents (ε + P + P2 + P3 + ….) ]
Hence, proved.
Conversion from DFA to R.E
Arden’s Theorem
Let p and q be two regular expression over alphabet, if p doesn’t contain
empty string then r= q + rp has a unique solution. i.e. r=qp*
Proof:
r=q + rp
r= qr (q +rp) p=q + qp + rp2
Substituting r= q + rp again and again
r= q + qp +qp2
+ qp3 + ………………………..
=q(ε + p + p2 +p3+ ………………………………)
= qp*
Proved.
Use of Arden’s rule:
To convert DFA into R.E there are certain assumption regarding the transition system.
They are as follow;
i) The transition graph does not have ε-moves (i.e. no ε-transition)
ii) It must have only one single starting state.
iii) Its vertices are represented as q1, q2,q3, …………………..qn.
iv) qi is a final state.
v) wij denotes the regular expressions representating the set of labels of edges from qi to qj.
We can get the following condition.
q1= q1w11 + q2w21 + q3w31 +………………………………………..+ qnwn1 + ε
q2= q1w12 + q2w22 + q3w32 +………………………………………..+ qnwn2 + ε
.
.
.
qn= q1w1n + q2w2n + q3w3n +………………………………………..+ qnwnn + ε
Hence, solving these equations for q1 in terms of wij’s gives R.E.
SEE EXAMPLE (Next slide)

Example: Construct a regular expression corresponding to the automata given
below −
Solution −
Here the initial state is q1 and the final state is q2
Now we write down the equations − 2
q1 = q10 + ε (NOTE: ε move is because q1 is the initial state0

= ε + q10 [NOTE: r=q1, p=0, q=ε]
q2 = q11 + q20
q3 = q2 1 + q3 0 + q3 1
Now, we will solve these three equations − NOTE::

q1 = ε0* [By Arden’s theorem] [NOTE: r=q1, p=0, q=ε] ARDEN’s THEOREM::
As, εR = R, r= q + rp has a unique solution. i.e. r=qp*
q1 = 0*
q2 = 0*1 + q20
q2 = 0*1(0)* [By Arden’s theorem]
Hence, the regular expression is 0*10*.

Example: Construct a regular expression corresponding to the automata given
below −
Now, we will solve these three equations −

Here the initial state and final state is q1. q2 = q1 b + q2 b + q3 b
The equations for the three states q1, q2, and q3 are = q1b + q2b + (q2a)b (Substituting value of q3)
as follows − = q1b + q2(b + ab)
q1 = q1a + q3a + ε (ε move is because q1 is the initial = q1b (b + ab)* (Applying Arden’s Theorem)
state0 q1 = q1 a + q3 a + ε
q2 = q 1 b + q 2 b + q 3 b = q1a + q2aa + ε (Substituting value of q3)
q3 = q 2 a = q1a + q1b(b + ab*)aa + ε (Substituting value of q2)
= q1(a + b(b + ab)*aa) + ε
= ε (a+ b(b + ab)*aa)*
= (a + b(b + ab)*aa)*
Hence, the regular expression is (a + b(b +
ab)*aa)*.
• Example: Convert the following DFA to R.E
SOLUTION:
Let the equations are;
q1=q21 + q30 + ε………………………..(i)
q2= q10………………………………….. (ii)
q3=q11……………………………………. (iii)
q4= q20 + q31 + q40 + q41…………….. (iv)
Now put q2 and q3 in eqn(i)
q1= q101 + q110 + ε
= ε + q1 (01+10)
where, q=ε r=q p=01 +10
Therefore, q1=ε (01 +10)*
since, q1 is the final state.
so, R.E= ε (01+ 10)*
= (01+ 10)* is the required R.E from given diagram.
Construction of an FA from an RE
We can use Thompson's Construction to find out a Finite Automaton from a Regular Expression. We will
reduce the regular expression into smallest regular expressions and converting these to NFA and finally
to DFA.
Some basic RA expressions are the following −
• Case 1 − For a regular expression ‘a’, we can construct the following FA −
• Case 2 − For a regular expression ‘ab’, we can construct the following FA −
• Case 3 − For a regular expression (a+b), we can construct the following FA −

• Case 4 − For a regular expression (a+b)*, we can construct the following FA −
• Case 3 − For a regular expression (a+b), we can construct the following FA −
• Case 4 − For a regular expression (a+b)*, we can construct the following FA −

DFA Minimization
• Every DFA defines a regular language
• In general, there can be many DFAs for a given regular language.
• These DFAs accept the same regular language.
– Language: The set of strings of 0’s and 1’s containg even number of 1’s
0 0 0 0 0
1 1 1
1 1
A minimal DFA
• In practice, we are interested in the DFA with the minimal number of states.
– Use less memory
– Use less hardware (flip-flops)
• We can find a minimal DFA for any given DFA and their languages are equal.
47
Minimization of DFA
Given a DFA M, that accepts a language L (M). Now, configure a DFA M ‘. During
the course of minimization, it involves identifying the equivalent states and
distinguishable states.
For minimization, the table filling algorithm is used.
• Distinguishable state:
Two states p & q are said to be distinguishable states if (for any) there exists a
string x, such that δ(p, x) is a final state δ(q, x) is not a final state.
• Indistinguishable State:
Two indistinguishable states behave same for all possible strings
Indistinguishable State:
Two indistinguishable states behave same for all possible strings
• Indistinguishable states behave the same for all possible strings.
– So, we do not need all of states from a set of indistinguishable states.
– We can eliminate all of them by keeping only one of them to represent that set of
indistinguishable states.
• Indistinguishability is an equivalence relation:

– Reflexive: Each state is indistinguishable from itself
– Symmetric: If p is indistinguishable from q, then q is indistinguishable from p
– Transitive: If p is indistinguishable from q, and q is indistinguishable from r, then p is
indistinguishable from r.
49
Equivalent States: Two states p & q are called equivalent states,
denoted by p ≡ q if and only if for each input string x, δ(p, x) is a final
state if and only if δ(q, x) is a final state.
Finding Distinguishable States – Table Filling Algorithm
• We can compute distinguishable states with an inductive table filling algorithm.
Basis:
• Any non-accepting state is distinguishable from any accepting state.
Induction:
• States p and q are distinguishable if there is some input symbol a such that
δ(p,a) is distinguishable from δ(q,a).
• All other pairs of states are indistinguishable, and can be merged
appropriately.
We can also use table filling algorithm to minimize a DFA by merging all
equivalent states.
That is, we replace a state p with its equivalence class found by the table filling
algorithm. 51
Table filling algorithm steps (Minimize DFA):
For identifying the pairs (p, q) with p ≠ q;

• List all the pairs of states for which p ≠ q.
• Make a sequence of passes through each pairs.
• On first pass, mark the pair for which exactly one element is final (F).
• On each sequence of pass, mark the pair (r, s) if for any a ε Σ, δ(r, a) =
p and δ(s, a) = q and (p, q) is already marked.
• After a pass in which no new pairs are to be marked, stop
• Then marked pairs (p, q) are those for which p q and unmarked pairs
are those for which p ≡ q.
Example1:
Minimize DFA
Eg:
After
first
iteration
First Iteration:: Fill based on
given final state
=> Box is checked if any of state
is final. But both should not be
final
Eg:
After
second
iteration
Second Iteration:: Check on empty boxes:: and fill

based on marked or unmarked state of output
state while applying i/p symbol
Eg:
After
third
iteration;
Stop
iteration;
It shows q2q0
and q5q3 can
be combined.
Third Iteration::
continue similar to second iteration for combined
states (i.e. q2q0 & q5q3)
=> No change so Stop iteration
Eg:
Start
drawing
minimized
DFA
It shows q2q0
and q5q3 can
be combined.
i.e. 6 state
reduced to 4
states
Eg:
Finalize
drawing
minimized
DFA
It shows q2q0
and q5q3 can
be combined.
i.e. 6 state
reduced to 4
states
Table Filling Algorithm: Minimizations of DFA
Example 2:
59
Example2:
Now to solve this problem first we should determine weather the pair is
distinguishable or not.
PASS 0: Distinguish accepting states from non-accepting states
C is only accepting state, it is distinguishable from all other non-accepting states.
NOTE: A ≢B means they are distinguishable
PASS 1:
Consider column A
A ≢B since δ(A,1)=F, δ(B,1)=C and F ≢C
so mark in AB box as distinguishable
A ≢D since δ(A,0)=B, δ(D,0)=C and B ≢C
so mark in AD box as distinguishable
A ≡E since
• δ(A,0)=B, δ(E,0)=H and B ≡H
• δ(A,1)=F, δ(E,1)=F and F ≡ F
so no mark in AE box
A ≢F since δ(A,0)=B, δ(F,0)=C and B ≢C
so mark in AF box (i.e. distinguishable)
A ≡G since
• δ(A,0)=B, δ(G,0)=G and B ≡G
• δ(A,1)=F, δ(G,1)=E and F ≡E
so no mark in AG box
A ≢H since δ(A,1)=F, δ(H,1)=C and F ≢C so mark AH
PASS 1:
Consider column B
B ≢D since δ(B,1)=C, δ(D,1)=G and C ≢G
B ≢E since δ(B,1)=C, δ(E,1)=F and C ≢F
B ≢F since δ(B,1)=C, δ(F,1)=G and C ≢G
B ≢G since δ(B,1)=C, δ(G,1)=E and C ≢E
B ≡H since
• δ(B,0)=G, δ(H,0)=G and G ≡G
• δ(B,1)=C, δ(H,1)=C and C ≡C
PASS 1:
Consider column D
D ≢E since δ(D,0)=C, δ(E,0)=H and C ≢H
D ≡F since
• δ(D,0)=C, δ(F,0)=C and C ≡C
• δ(D,1)=G, δ(F,1)=G and G ≡G
D ≢G since δ(D,0)=C, δ(G,0)=G and C ≢G

D ≢H since δ(D,0)=C, δ(H,0)=G and C ≢G
PASS 1:
Consider column E,F,G
E ≢F since δ(E,0)=H, δ(F,0)=C and H ≢C

E ≢G since δ(E,1)=F, δ(G,1)=E and F ≢E
E ≢H since δ(E,1)=F, δ(H,1)=C and F ≢C
F ≢G since δ(F,0)=C, δ(G,0)=G and C ≢G

F ≢H since δ(F,0)=C, δ(H,0)=G and C ≢G
G ≢H since δ(G,1)=E, δ(H,1)=C and E ≢C

PASS 2:
Consider column A,B,D
A ≡E since
• δ(A,0)=B, δ(E,0)=H and B ≡H
• δ(A,1)=F, δ(E,1)=F and F ≡F
A ≢ G since δ(A,1)=F, δ(G,1)=E and F ≢E
B ≡H since
• δ(B,0)=G, δ(H,0)=G and G ≡G
• δ(B,1)=C, δ(H,1)=C and C ≡C
D ≡F since
• δ(D,0)=C, δ(F,0)=C and C ≡C
• δ(D,1)=G, δ(F,1)=G and G ≡G
PASS 3:
Consider column A,B,D
A ≡E since
• δ(A,0)=B, δ(E,0)=H and B ≡H
• δ(A,1)=F, δ(E,1)=F and F ≡F
B ≡H since
• δ(B,0)=G, δ(H,0)=G and G ≡G
• δ(B,1)=C, δ(H,1)=C and C ≡C
D ≡F since
• δ(D,0)=C, δ(F,0)=C and C ≡C
• δ(D,1)=G, δ(F,1)=G and G ≡G
No new marked states in PASS 3. We are done, and

we found all distinguishable states(marked ones).
Equivalence Classes:
{ {A,E}, {B,H}, {C}, {D,F}, {G} }
=
=
=
Example 3 :
d
a b c d e
b
c ✔ ✔
d ✔ ✔
e ✔ ✔
f ✔ ✔ ✔ ✔ ✔
d
a b c d e
b b)
c ✔ ✔
d ✔ ✔
e ✔ ✔
f ✔ ✔ ✔ ✔ ✔
Proving Language not to be Regular
• It is shown that the class of language known as regular language has at least four
different descriptions. They are the language accepted by DFA’s, by NFA’s, by Є-
NFA, and defined by RE.
• Not every language is Regular. To show that a langauge is not regular, the
powerful technique used is known as Pumping Lemma.
Pumping lemma can not be used to prove language is regular.

Pumping lemma can be used to prove language is not regular(Next
some slides)
The Pumping Lemma for Regular Languages
• There are languages which are NOT regular.
• Every regular language satisfies the pumping lemma.
• A non-regular language can be shown that it is NOT regular using

the pumping lemma.
• L01 = {0n1n | n ≥ 1 } is not regular.

– We can use the pumping lemma to show that this language is
not regular.
The Pumping Lemma for Regular Languages
• Let L be a regular language.
• Then there exists a constant n such that for every string w in L such
that |w| > n, we can break w into three strings, w = xyz, such that:
1. y ≠ ∈ i.e. |y|>0
Number of states of DFA
2. |xy| ≤ n for L
3. For all k ≥ 0, the string xykz is also in L.
• That is, we can always find a nonempty string y not too far from the
beginning of w that can be "pumped";
• i.e. repeating y any number of times, or deleting it (the case k= 0),
keeps the resulting string in the language L.
Proof: The Pumping Lemma - Proof
• Suppose L is regular
• Then L is recognized by some DFA A with n states, and L= L(A).
• Let a string w=a1a2...am ∈ L, where m>n
• Let pi = δ(q0, a1a2...ai)
• Then, there exists j such that i<j and pi = pj
• Now we have w=xyz where
1. x = a1a2...ai
2. y = ai+1ai+2...aj
3. z = aj+1aj+2...am
The Pumping Lemma – Proof (cont.)
• That is, x takes us to pi, once; y takes us from pi back to pi. (since pi is also
pj), and z is the balance of w.
• So we have the following figure, and every string longer than the number of
states must cause a state to repeat.
• Since y can repeat 0 or more times

è xykz ∈ L for any k>0.
Applications of The Pumping Lemma
• Every regular language satisfies the pumping lemma.
• A non-regular language can be shown that it is NOT regular using the

pumping lemma.
• L01 = {0n1n | n ≥ 1 } is not regular.

– We can use the pumping lemma to show that this language is not
regular.
Applications of The Pumping Lemma – Example 1
Example 1: Let us show that the language L01 = {0n1n | n ≥ 1 } is NOT regular.
Proof: (proof by contradiction)

• Suppose L01 were a regular language (ourassumption)
• Then, w = 0n1n ∈ L01 for any n
• By the pumping lemma, w=xyz, |xy|≤n, y ≠ ∈and xykz ∈ L01.
• If y repeats 0 times, xy must be in L01 by the pumping lemma.

• Since y ≠ ∈, xy has fewer 0’s than 1’s
• So, there is a contradiction with our assumption (L01 is regular)
• Proof by contradiction, we prove that L01 is NOTregular
Example 2: Let us show that the language Leq is the set of all strings with an equal
number of 0's and l's is NOT a regular language.

• The proof is exactly same as the proof of L01.
Example 3: Let us show that the language Lpr is the set of all strings of l's whose length is a
prime is NOT a regular language.
• Suppose Lpr were a regular language (our assumption)
• Choose a prime p≥n+2 (this is possible since there are infinite number of primes.)
• Now, xyp-mz ∈ Lpr by the pumping lemma.

• |xyp-mz| = |xz| + (p-m)|y| = (p-m)+(p-m)m = (1+m)(p-m)
• But, (1+m)(p-m) is not prime unless one of the factors is 1.
– y ≠ ∈ è (1+m) > 1
– m=|y| ≤ |xy| ≤ n and p≥n+2 è (p-m) ≥ (n+2)-n ≥ 2
• So, there is a contradiction with our assumption
è Proof by contradiction, Lpr is NOT regular.
Example 4: Let us show that the language L is the set of all strings with the number of 0's is
more the the number of l's is NOT a regular language.
• Suppose L were a regular language (our assumption)
• Then, w=0n+11n ∈ L for any n since n+1>n
• By the pumping lemma, w=xyz, |xy|≤n, y ≠ ∈and xykz ∈ L.
w = 000…00111…11
x. y z
• Since 0<|y|≤n, y must contain only 0’s.
• By the pumping lemma, xy0z ∈ L. But the number of 0’s cannot be more than the
number 1’s because 0<|y|≤n and y must contain only 0’s.
• So, there is a contradiction with our assumption (L is regular)
• Proof by contradiction, we prove that L is NOT regular
Pumping Lemma For Regular Grammars
Theorem
Let L be a regular language. Then there exists a constant ‘c’ such that for every
string w in L −
|w| ≥ c
We can break w into three strings, w = xyz, such that −
|y| > 0
|xy| ≤ c
For all k ≥ 0, the string xykz is also in L.
Applications of Pumping Lemma

Pumping Lemma is to be applied to show that certain languages are not regular. It
should never be used to show a language is regular.
If L is regular, it satisfies Pumping Lemma.
If L does not satisfy Pumping Lemma, it is non-regular.
Method to prove that a language L is not regular::
At first, we have to assume that L is regular.
So, the pumping lemma should hold for L.
Use the pumping lemma to obtain a contradiction −
Select w such that |w| ≥ c
Select y such that |y| ≥ 1
Select x such that |xy| ≤ c
Assign the remaining string to z.
Select k such that the resulting string is not in L.
Hence L is not regular.
Example:: Prove that L = {aibi | i ≥ 0} is not regular.
Solution −
• At first, we assume that L is regular and n is the number of states.
• Let w = anbn. Thus |w| = 2n ≥ n.
• By pumping lemma, let w = xyz, where |xy| ≤ n.
• Let x = ap, y = aq, and z = arbn, where p + q + r = n, p ≠ 0, q ≠ 0, r ≠ 0. Thus |y| ≠ 0.
• Let k = 2. Then xy2z = apa2qarbn.
• Number of as = (p + 2q + r) = (p + q + r) + q = n + q
• Hence, xy2z = an+q bn. Since q ≠ 0, xy2z is not of the form anbn.
• Thus, xy2z is not in L. Hence L is not regular.
Properties of Regular Languages – Summary
Minimizing Deterministic Finite Automata:
• We can partition states of any DFA into groups of mutually indistinguishable states.
• Members of two different groups are always distinguishable.
• If we replace each group by a single state we get an equivalent DFA that has as few states as
any DFA for the same language.
Testing Distinguishability of States:

• Two states of a DFA are distinguishable if there is an input string that takes exactly one of the
two states to an accepting state.
• By starting with only the fact that pairs consisting of one accepting and one non- accepting
state are distinguishable and trying to discover additional pairs of distinguishable states by
finding pairs whose successors on one input symbol are distinguishable we can discover all
pairs of distinguishable states.
Properties of Regular Languages – Summary
Closure Properties of Regular Languages:
• There are many operations that preserve the property of being a regular language.
• Among these are union, concatenation, closure, intersection, complement, difference, reversal,
homomorphism.
Decision Properties of Regular Languages:

• Testing emptiness of regular languages
• Testing whether a regular language is finite or not.
The Pumping Lemma for Regular Languages:

• If a language is regular then every sufficiently long string in the language has a nonempty
substring that can be pumped that is repeated any number of times while the resulting strings
are also in the language.
• This fact can be used to prove that many different languages are not regular.
Pumping Lemma for Regular Languages
For any regular language L, there exists an integer n, such that for all z ∈ L with |z|
≥ n, there exists u, v, w ∈ Σ∗, such that z = uvw, and
i. |v| > 0
ii. |uv| ≤ n
iii. for all i ≥ 0: uviw ∈ L
In simple terms, this means that if a string v is ‘pumped’, i.e., if v is inserted any
number of times, the resultant string still remains in L.
Pumping Lemma is used as a proof for irregularity of a language. Thus, if a

language is regular, it always satisfies pumping lemma. If there exists at least one
string made from pumping which is not in L, then L is surely not regular.
The opposite of this may not always be true. That is, if Pumping Lemma holds, it
does not mean that the language is regular.
Q. Let L= { anbn | n>=0 }. By using pumping lemma show that L is not regular
language.
Solution:
Step1: Assume L is a regular language in order to obtain contradiction. Let n be the
number of states in finite automata accepting L.
Step2: Let w = anbn, then |w| = 2n > n. Using pumping lemma, we can
demonstrate w in three parts of xyz such that w = xyz with |xy| <=n and |y| > 0.
Step3: Now we want to find i, xyiz ? L.
There are three possibilities for y, we will consider all cases one by one and show
that given language contains some string not for { anbn | n>=0 }.
Case 1: The string y consists of only a’s i.e. y = ak (k>=1).
We have w = xyz
w = anbn
In given language we have equal numbers of a’s and b’s w ? L so it must
satisfy this condition. Let us take i=0.
As xyz = anbn
xz = an-kbn
n-k ? n
So xz ? L. This case is a contradiction.
Case 2: The string y consists of only b’s i.e. y = bm (m >= 1).
We have w = xyz
w = anbn
In given language, we have equal number of a’s and b’s w ? L, so it must satisfy
this condition. Let us take i=0.
As xz = anbn-m
xz = an-kbn
Where n ? m
So xz ? L. This case also gives contradiction.
Case 3: The string y consists of both a’s and b’s i.e. y = akbm (k,m >= 1).
We have w = xyz
w = anbn
w = an-kakbm bn-m
In given language we have equal number of a’s and b’s w ? L, so it must satisfy this
condition. Let us take i=2.
xy2z = xyyzi
= an-kakbmkbm bn-m
In this case, the string xyyz must have equal number of a’s and b’s but they are out
of order with some b’s before a’s. Hence it is not a member of L. which contradicts
our assumption.
Thus, in all cases we get a contradiction. Therefore, L is not regular.
Pumping lemma for regular languages
The pumping lemma for regular languages describes an essential property of all
regular languages.
Informally, it says that all sufficiently long words in a regular language may be
pumped that is, have a middle section of the word repeated an arbitrary number
of times to produce a new word which also lies within the same language.
Here's a more formal definition of the pumping lemma:

If L is an infinite regular language, then there exists some positive integer m such
that any string w∈L whose length is m or greater can be decomposed into three
parts, xyz, where
• |xy| is less than or equal to m,
• |y| > 0,
• wi = xyiz is also in L for all i = 0, 1, 2, 3, ....
Here's what it all means:
• m is a (finite) number chosen so that strings of length m or greater must contain a
cycle. Hence, m must be equal to or greater than the number of states in the dfa.
Remember that we don't know the dfa, so we can't actually choose m; we just know
that such an m must exist.
• Since string w has length greater than or equal to m, we can break it into two parts, xy
and z, such that xy must contain a cycle. We don't know the dfa, so we don't know
exactly where to make this break, but we know that |xy| can be less than or equal to m.
• We let x be the part before the cycle, y be the cycle, and z the part after the cycle. (It
is possible that x and z contain cycles, but we don't care about that.) Again, we don't
know exactly where to make this break.
• Since y is the cycle we are interested in, we must have |y| > 0, otherwise it isn't a
cycle.
• By repeating y an arbitrary number of times, xy*z, we must get other strings in L.
• If, despite all the above uncertainties, we can show that the dfa has to accept some
string that we know is not in the language, then we can conclude that the language is
not regular.
Formal statement of the Pumping Lemma:
L is a Regular Language implies
(there exists n)(for all z)[z in L and |z|>=n implies
{(there exists u,v,w)(z = uviw and |uv|<=n and |v|>=1 and
(for all i>=0)(uv w is in L) )}]
How to use?
The two commonest ways to use the Pumping Lemma to prove a language is NOT
regular are:
a) show that there is no possible n for the (there exists n), this is usually
accomplished by showing a contradiction such as (n+1)(n+1) < n*n+n
b) show there is no way to partition z into u, v and w such that
uviw is in L, typically for a value i=0 or i=2.
Be sure to cover all cases by argument or enumerating cases.

Examples and Exercises related to pumping lemma for regular
language.
1. Prove that L = {0i | i is a perfect square} is not a regular language.
Proof:
Assume that L is regular and let m be the integer guaranteed by the pumping
lemma. Now, consider the string w = 0m^2. Clearly w ∈ L, so w can be written as
w = xyz with |xy| ≤ m and y ≠ λ (or |y| > 0). Consider what happens when i = 2.
That is, look at xy2z.
Then, we have m2,
|w| <|xy2z|
= m2 +m
= m(m + 1) < (m + 1)2 (No perfect square)
That is, the length of the string xy2z lies between two consecutive perfect
squares. This means xy2z ∉ L contradicting the assumption that L is regular.
2. Prove that L = {ww | w ∈ {a, b}*} is not regular.
Assume L is regular and let m be the integer from the pumping lemma. Choose
w = ambamb.
Clearly, w ∈ L so by the pumping lemma,
w = xyz such that |xy| ≤ m. |y| > 0 and xyiz ∈ L for all i ≥ 0.
Let p = |y|.
Consider what happens when i = 0.
The resulting string, xz = am-pbamb.
Since p ≥ 1, the number of a’s in the two runs are not the same, and thus this
string is not in L.
Therefore L is not regular.
3. Prove that L = {anbn: n >0} is not regular.
Proof:
1. We don't know m, but assume there is one.
2. Choose a string w = anbn where n > m, so that any prefix of length m consists
entirely of a's.
3. We don't know the decomposition of w into xyz, but since |xy| ≤m, xy must
consist entirely of a's. Moreover, y cannot be empty.
4. Choose i = 0.
This has the effect of dropping |y| a's out of the string, without affecting the
number of b’s.
The resultant string has fewer a's than b's, hence does not belong to L.
Therefore L is not regular.
4. Prove that L = {anbk: n > k and n≠ 0} is not regular.
Proof:
1. We don't know m, but assume there is one.
2. Choose a string w = anbk where n > m, so that any prefix of length m consists
entirely of a's, and k = n-1, so that there is just one more a than b.
3. We don't know the decomposition of w into xyz, but since |xy| ≤m, xy must
consist entirely of a's. Moreover, y cannot be empty.
4. Choose i = 0.
This has the effect of dropping |y| a's out of the string, without affecting the
number of b’s.
The resultant string has fewer a's than before, so it has either fewer a's than b's,
or the same number of each.
Either way, the string does not belong to L, so L is not regular.
5. Prove that L = {an: n is a prime number} is not regular.
Proof:
1. We don't know m but assume there is one.
2. Choose a string w = an where n is a prime number and |xyz| = n > m+1. (This
can always be done because there is no largest prime number.) Any prefix of w
consists entirely of a's.
3. We don't know the decomposition of w into xyz, but since |xy| ≤m, it follows
that |z| > 1.
As usual, |y| > 0,
4. Since |z| > 1, |xz| > 1. Choose i = |xz|. Then,
|xyiz| = |xz| + |y||xz| = (1 + |y|)|xz|.
Since (1 +|y|) and |xz| are each greater than 1, the product must be a composite
number.
Thus |xyiz| is a composite number.

Unit-3 Regular Expressions

Uploaded by

Unit-3 Regular Expressions

Uploaded by

Unit III: Regular Expressions

3.1. Regular Expressions, Operators of Regular Expressions (Union,

Property 1. The union of two regular set is regular.

Regular language Regular set

a* {∈, a, aa, aaa …..}

a* + ba {∈, a, aa, aaa,…… , ba}

• For example (a + b )* and ( a*b* )* correspond to the set of all strings

Thus for example (a + b )* = ( a*b* )* , because they both represent the

In general, it is not easy to see by inspection whether or not two regular

Regular language Regular set

a* {∈, a, aa, aaa …..}

a* + ba {∈, a, aa, aaa,…… , ba}

Validation: Determining that a string complies with a set of formatting

Tokenization: Converting a sequence of characters into words, tokens (like

In order to show that the RE define the same class of language as

SEE EXAMPLE (Next slide)

q1 = q10 + ε (NOTE: ε move is because q1 is the initial state0

Now, we will solve these three equations − NOTE::

Hence, the regular expression is 0*10*.

Now, we will solve these three equations −

• Case 2 − For a regular expression ‘ab’, we can construct the following FA −

• Case 3 − For a regular expression (a+b), we can construct the following FA −

• Case 4 − For a regular expression (a+b)*, we can construct the following FA −

• Indistinguishability is an equivalence relation:

For identifying the pairs (p, q) with p ≠ q;

Second Iteration:: Check on empty boxes:: and fill

D ≢G since δ(D,0)=C, δ(G,0)=G and C ≢G

E ≢F since δ(E,0)=H, δ(F,0)=C and H ≢C

F ≢G since δ(F,0)=C, δ(G,0)=G and C ≢G

G ≢H since δ(G,1)=E, δ(H,1)=C and E ≢C

A ≢ G since δ(A,1)=F, δ(G,1)=E and F ≢E

No new marked states in PASS 3. We are done, and

Pumping lemma can not be used to prove language is regular.

• There are languages which are NOT regular.

• Every regular language satisfies the pumping lemma.

• A non-regular language can be shown that it is NOT regular using

• L01 = {0n1n | n ≥ 1 } is not regular.

• Since y can repeat 0 or more times

• A non-regular language can be shown that it is NOT regular using the

• L01 = {0n1n | n ≥ 1 } is not regular.

Proof: (proof by contradiction)

• If y repeats 0 times, xy must be in L01 by the pumping lemma.

Proof: (proof by contradiction)

• Now, xyp-mz ∈ Lpr by the pumping lemma.

Applications of Pumping Lemma

Testing Distinguishability of States:

Decision Properties of Regular Languages:

The Pumping Lemma for Regular Languages:

Pumping Lemma is used as a proof for irregularity of a language. Thus, if a

Here's a more formal definition of the pumping lemma:

Be sure to cover all cases by argument or enumerating cases.

You might also like

• For example (a + b )* and ( ab )* correspond to the set of all strings

Thus for example (a + b )* = ( ab )* , because they both represent the

Hence, the regular expression is 010.