TOC Notes
TOC Notes
Theory of Automata
Theory of automata is a theoretical branch of computer science and mathematical. It is
the study of abstract machines and the computation problems that can be solved using
these machines. The abstract machine is called the automata. The main motivation
behind developing the automata theory was to develop methods to describe and
analyse the dynamic behaviour of discrete systems.
This automaton consists of states and transitions. The State is represented by circles,
and the Transitions is represented by arrows.
Automata is the kind of machine which takes some string as input and this input goes
through a finite number of states and may enter in the final state.
There are the basic terminologies that are important and frequently used in automata:
Symbols:
Symbols are an entity or individual objects, which can be any letter, alphabet or any
picture.
Example:
1, a, b, #
Alphabets:
Alphabets are a finite set of symbols. It is denoted by ∑.
Examples:
∑ = {a, b}
1
∑ = {A, B, C, D}
2
3 ∑ = {0, 1, 2}
4 ∑ = {0, 1, ....., 5]
5. ∑ = {#, β, Δ}
String:
Example 1:
If ∑ = {a, b}, various string that can be generated from ∑ are {ab, aa, aaa, bb, bbb, ba,
aba.....}.
Example 2:
1. w = 010
Language:
Example: 1
L1 = {Set of string of length 2}
Example: 2
L2 = {Set of all strings starts with 'a'}
o It takes the string of symbol as input and changes its state accordingly. When the desired
symbol is found, then the transition occurs.
o At the time of transition, the automata can either move to the next state or stay in the
same state.
Finite automata have two states, Accept state or Reject state. When the input string is
o
processed successfully, and the automata reached its final state, then it will accept.
Formal Definition of FA
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:
Input tape: It is a linear tape having some number of cells. Each input symbol is placed
in each cell.Finite control: The finite control decides the next state on receiving particular
input from input tape. The tape reader reads the cells one by one from left to right, and
at a time only one input symbol is read.
Types of Automata:
There are two types of finite automata:
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the
computation. In the DFA, the machine goes to one state only for a particular input
character. DFA does not accept the null move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of
states for a particular input. It can accept the null move.
1. In DFA, the input to the automata can be any string. Now, put a pointer to the start
state q and read the input string w from left to right and move the pointer according to
the transition function, δ. We can read one symbol at a time. If the next symbol of string
w is a and the pointer is on state p, move the pointer to δ(p, a). When the end of the input
string w is encountered, then the pointer is on some state F.
∈
2. The string w is said to be accepted by the DFA if r F that means the input string w is
processed successfully and the automata reached its final state. The string is said to be
∉
rejected by DFA if r F.
Example 1:
DFA with ∑ = {0, 1} accepts all strings starting with 1.
Solution:
The finite automata can be represented using a transition graph. In the above diagram,
the machine initially is in start state q0 then on receiving input 1 the machine changes
its state to q1. From q0 on receiving 0, the machine changes its state to q2, which is the
dead state. From q1 on receiving input 0, 1 the machine changes its state to q1, which is
the final state. The possible input strings that can be generated are 10, 11, 110, 101,
111......., that means all string starts with 1.
Example 2:
NFA with ∑ = {0, 1} accepts all strings starting with 1.
Solution:
The NFA can be represented using a transition graph. In the above diagram, the machine
initially is in start state q0 then on receiving input 1 the machine changes its state to q1.
From q1 on receiving input 0, 1 the machine changes its state to q1. The possible input
string that can be generated is 10, 11, 110, 101, 111......, that means all string starts with
1.
Transition Table
The transition table is basically a tabular representation of the transition function. It takes
two arguments (a state and a symbol) and returns a state (the "next state").
Example 1:
Solution:
→q0 q1 q2
q1 q0 q2
*q2 q2 q2
Explanation:
o In the above table, the first column indicates all the current states. Under column
0 and 1, the next states are shown.
o The first row of the transition table can be read as, when the current state is q0, on
input 0 the next state will be q1 and on input 1 the next state will be q2.
o In the second row, when the current state is q1, on input 0, the next state will be
q0, and on 1 input the next state will be q2.
o In the third row, when the current state is q2 on input 0, the next state will be q2,
and on 1 input the next state will be q2.
o The arrow marked to q0 indicates that it is a start state and circle marked to q2
indicates that it is a final state.
Example 2:
Solution:
→q0 q0 q1
q1 q1, q2 q2
q2 q1 q3
*q3 q2 q2
Explanation:
o The first row of the transition table can be read as, when the current state is q0, on
input 0 the next state will be q0 and on input 1 the next state will be q1.
o In the second row, when the current state is q1, on input 0 the next state will be
either q1 or q2, and on 1 input the next state will be q2.
o In the third row, when the current state is q2 on input 0, the next state will be q1,
and on 1 input the next state will be q3.
o In the fourth row, when the current state is q3 on input 0, the next state will be q2,
and on 1 input the next state will be q2.
In the following diagram, we can see that from state q0 for input a, there is only one path
which is going to q1. Similarly, from q0, there is only one path for input b going to q2.
1. δ: Q x ∑→Q
Graphical Representation of DFA
A DFA can be represented by digraphs called state diagram. In which:
Example 1:
Solution:
Transition Diagram:
Transition Table:
→q0 q0 q1
q1 q2 q1
*q2 q2 q2
Example 2:
DFA with ∑ = {0, 1} accepts all starting with 0.
Solution:
Explanation:
o In the above diagram, we can see that on given 0 as input to DFA in state q0 the DFA
changes state to q1 and always go to final state q1 on starting input 0. It can accept 00,
01, 000, 001....etc. It can't accept any string which starts with 1, because it will never go to
final state on a string starting with 1.
Example 3:
DFA with ∑ = {0, 1} accepts all ending with 0.
Solution:
Explanation:
In the above diagram, we can see that on given 0 as input to DFA in state q0, the DFA
changes state to q1. It can accept any string which ends with 0 like 00, 10, 110, 100....etc.
It can't accept any string which ends with 1, because it will never go to the final state q1
on 1 input, so the string ending with 1, will not be accepted or will be rejected.
Examples of DFA
Example 1:
Design a DFA with ∑ = {0, 1} accepts those string which starts with 1 and ends with 0.
Solution:
The DFA will have a start state q0 from which only the edge with input 1 will go to the
next state.
In state q1, if we read 1, we will be in state q1, but if we read 0 at state q1, we will reach
to state q2 which is the final state. In state q2, if we read either 0 or 1, we will go to q2
state or q1 state respectively. Note that if the input ends with 0, it will be in the final state.
Example 2:
Design a DFA with ∑ = {0, 1} accepts the only input 101.
Solution:
In the given solution, we can see that only input 101 will be accepted. Hence, for input
101, there is no other path shown for other input.
Example 4:
Design FA with ∑ = {0, 1} accepts the set of all strings with three consecutive 0's.
Solution:
The strings that will be generated for this particular languages are 000, 0001, 1000, 10001,
.... in which 0 always appears in a clump of 3. The transition graph is as follows:
Note that the sequence of triple zeros is maintained to reach the final state.
Example 5:
Design a DFA L(M) = {w | w ε {0, 1}*} and W is a string that does not contain consecutive
1's.
Solution:
The stages q0, q1, q2 are the final states. The DFA will generate the strings that do not
contain consecutive 1's like 10, 110, 101,..... etc.
Example 6:
Design a FA with ∑ = {0, 1} accepts the strings with an even number of 0's followed by
single 1.
Solution:
In the following image, we can see that from state q0 for input a, there are two next states
q1 and q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is not fixed
or determined that with a particular input where to go next. Hence this FA is called non-
deterministic finite automata.
Formal definition of NFA:
NFA also has five states same as DFA, but with different transition function, as shown
follows:
δ: Q x ∑ →2Q
where,
Example 1:
→q0 q0, q1 q1
q1 q2 q0
*q2 q2 q1, q2
In the above diagram, we can see that when the current state is q0, on input 0, the next
state will be q0 or q1, and on 1 input the next state will be q1. When the current state is
q1, on input 0 the next state will be q2 and on 1 input, the next state will be q0. When the
current state is q2, on 0 input the next state is q2, and on 1 input the next state will be q1
or q2.
Example 2:
NFA with ∑ = {0, 1} accepts all strings with 01.
Solution:
Transition Table:
→q0 q1 Ε
q1 Ε q2
*q2 q2 q2
Example 3:
NFA with ∑ = {0, 1} and accept all string of length atleast 2.
Solution:
Transition Table:
→q0 q1 q1
q1 q2 q2
*q2 Ε Ε
Examples of NFA
Example 1:
Design a NFA for the transition table as given below:
Present State 0 1
q2 q2, q3 q3
→q3 q3 q3
Solution:
The transition diagram can be drawn by using the mapping function as given in the table.
Here,
Example 2:
Design an NFA with ∑ = {0, 1} accepts all string ending with 01.
Solution:
Hence, NFA would be:
Example 3:
Design an NFA with ∑ = {0, 1} in which double '1' is followed by double '0'.
Solution:
Then,
Now before double 1, there can be any string of 0 and 1. Similarly, after double 0, there
can be any string of 0 and 1.
1. q0 → q1 → q2 → q3 → q4 → q4 → q4 → q4
Example 4:
Design an NFA in which all the string contain a substring 1110.
Solution:
The language consists of all the string containing substring 1010. The partial transition
diagram can be:
Now as 1010 could be the substring. Hence we will add the inputs 0's and 1's so that the
substring 1010 of the language can be maintained. Hence the NFA becomes:
Transition table for the above transition diagram can be given below:
Present State 0 1
→q1 q1 q1, q2
q2 q3
q3 q4
q4 q5
*q5 q5 q5
As state q5 is the accept state. We get the complete scanned, and we reached to the final
state.
Example 5:
Design an NFA with ∑ = {0, 1} accepts all string in which the third symbol from the right
end is always 0.
Solution:
Thus we get the third symbol from the right end as '0' always. The NFA can be:
The above image is an NFA because in state q0 with input 0, we can either go to state q0
or q1.
Eliminating ε Transitions
NFA with ε can be converted to NFA without ε, and this NFA without ε can be converted
to DFA. To do this, we will use a method, which can remove all the ε transition from given
NFA. The method will be:
1. Find out all the ε transitions from each state from Q. That will be called as ε-
closure{q1} where qi ∈ Q.
2. Then δ' transitions can be obtained. The δ' transitions mean a ε-closure on δ
moves.
3. Repeat Step-2 for each input symbol and each state of given NFA.
4. Using the resultant states, the transition table for equivalent NFA without ε can be
built.
Example:
Convert the following NFA with ε to NFA without ε.
1. ε-closure(q0) = {q0}
2. ε-closure(q1) = {q1, q2}
3. ε-closure(q2) = {q2}
*q1 Ф {q2}
*q2 Ф {q2}
State q1 and q2 become the final state as ε-closure of q1 and q2 contain the final state
q2. The NFA can be shown by the following transition diagram:
Let, M = (Q, ∑, δ, q0, F) is an NFA which accepts the language L(M). There should be
equivalent DFA denoted by M' = (Q', ∑', q0', δ', F') such that L(M) = L(M').
Step 2: Add q0 of NFA to Q'. Then find the transitions from this start state.
Step 3: In Q', find the possible set of states for each input symbol. If this set of states is
not in Q', then add it to Q'.
Step 4: In DFA, the final state will be all the states which contain F(final states of NFA)
Example 1:
Convert the given NFA to DFA.
Solution: For the given transition diagram we will first construct the transition table.
State 0 1
→q0 q0 q1
q1 {q1, q2} q1
1. δ'([q0], 0) = [q0]
2. δ'([q0], 1) = [q1]
1. δ'([q2], 0) = [q2]
2. δ'([q2], 1) = [q1, q2]
The state [q1, q2] is the final state as well because it contains a final state q2. The transition
table for the constructed DFA will be:
State 0 1
Example 2:
Convert the given NFA to DFA.
Solution: For the given transition diagram we will first construct the transition table.
State 0 1
1. δ'([q1], 0) = ϕ
2. δ'([q1], 1) = [q0, q1]
Similarly,
As in the given NFA, q1 is a final state, then in DFA wherever, q1 exists that state becomes
a final state. Hence in the DFA, final states are [q1] and [q0, q1]. Therefore set of final
states F = {[q1], [q0, q1]}.
State 0 1
Suppose
1. A = [q0]
2. B = [q1]
3. C = [q0, q1]
Where
NFA with ∈ move: If any FA contains ε transaction or move, the finite automata is called
NFA with ∈ move.
ε-closure: ε-closure for a given state A means a set of states which can be reached from
the state A with only ε(null) move including the state A itself.
Step 3: If we found a new state, take it as current state and repeat step 2.
Step 4: Repeat Step 2 and Step 3 until there is no new state present in the transition table
of DFA.
Step 5: Mark the states of DFA as a final state which contains the final state of NFA.
Example 1:
Convert the NFA with ε into its equivalent DFA.
Solution:
Hence
Now,
For state C:
Example 2:
Convert the given NFA into its equivalent DFA.
Solution: Let us obtain the ε-closure of each state.
Now we will obtain δ' transition. Let ε-closure(q0) = {q0, q1, q2} call it as state A.
1. δ'(A, 0) = A
2. δ'(A, 1) = B
3. δ'(A, 2) = C
Hence
1. δ'(B, 0) = ϕ
2. δ'(B, 1) = B
3. δ'(B, 2) = C
As A = {q0, q1, q2} in which final state q2 lies hence A is final state. B = {q1, q2} in which
the state q2 lies hence B is also final state. C = {q2}, the state q2 lies hence C is also a final
state.
Minimization of DFA
Minimization of DFA means reducing the number of states from given FA. Thus, we get
the FSM(finite state machine) with redundant states after minimizing the FSM.
We have to follow the various steps to minimize the DFA. These are as follows:
Step 1: Remove all the states that are unreachable from the initial state via any set of the
transition of DFA.
Step 3: Now split the transition table into two tables T1 and T2. T1 contains all final states,
and T2 contains non-final states.
1. 1. δ (q, a) = p
2. 2. δ (r, a) = p
That means, find the two states which have the same value of a and b and remove one of
them.
Step 5: Repeat step 3 until we find no similar rows available in the transition table T1.
Step 7: Now combine the reduced T1 and T2 tables. The combined transition table is the
transition table of minimized DFA.
Example:
Solution:
Step 1: In the given DFA, q2 and q4 are the unreachable states so remove them.
Step 2: Draw the transition table for the rest of the states.
State 0 1
→q0 q1 q3
q1 q0 q3
*q3 q5 q5
*q5 q5 q5
Step 3: Now divide rows of transition table into two sets as:
1. One set contains those rows, which start from non-final states:
State 0 1
q0 q1 q3
q1 q0 q3
2. Another set contains those rows, which starts from final states.
State 0 1
q3 q5 q5
q5 q5 q5
Step 5: In set 2, row 1 and row 2 are similar since q3 and q5 transit to the same state on
0 and 1. So skip q5 and then replace q5 by q3 in the rest.
State 0 1
q3 q3 q3
State 0 1
→q0 q1 q3
q1 q0 q3
*q3 q3 q3
Regular Expression
o The language accepted by finite automata can be easily described by simple expressions
called Regular Expressions. It is the most effective way to represent any language.
o The languages accepted by some regular expression are referred to as Regular languages.
o A regular expression can also be described as a sequence of pattern that defines a string.
o Regular expressions are used to match character combinations in strings. String searching
algorithm used this pattern to find the operations on a string.
For instance:
In a regular expression, x* means zero or more occurrence of x. It can generate {e, x, xx,
xxx, xxxx, .....}
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx,
xxxx, .....}
Union: If L and M are two regular languages then their union L U M is also a union.
1. 1. L U M = {s | s is in L or s is in M}
Intersection: If L and M are two regular languages then their intersection is also an
intersection.
1. 1. L ⋂ M = {st | s is in L and t is in M}
Kleen closure: If L is a regular language then its Kleen closure L1* will also be a regular
language.
Example 1:
Write the regular expression for the language accepting all combinations of a's, over the
set ∑ = {a}
Solution:
All combinations of a's means a may be zero, single, double and so on. If a is appearing
zero times, that means a null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we
give a regular expression for this as:
1. R = a*
Example 2:
Write the regular expression for the language accepting all combinations of a's except the
null string, over the set ∑ = {a}
Solution:
This set indicates that there is no null string. So we can denote regular expression as:
R = a+
Example 3:
Write the regular expression for the language accepting all the string containing any
number of a's and b's.
Solution:
1. r.e. = (a + b)*
This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a and
b.
The (a + b)* shows any combination with a and b even a null string.
Solution:
In a regular expression, the first symbol should be 1, and the last symbol should be 0. The
r.e. is as follows:
1. R = 1 (0+1)* 0
Example 2:
Write the regular expression for the language starting and ending with a and having any
having any combination of b's in between.
Solution:
1. R = a b* a
Example 3:
Write the regular expression for the language starting with a but not having consecutive
b's.
1. R = {a + ab}*
Example 4:
Write the regular expression for the language accepting all the string in which any number
of a's is followed by any number of b's is followed by any number of c's.
Solution: As we know, any number of a's means a* any number of b's means b*, any
number of c's means c*. Since as given in problem statement, b's appear after a's and c's
appear after b's. So the regular expression could be:
1. R = a* b* c*
Example 5:
Write the regular expression for the language over ∑ = {0} having even length of the
string.
Solution:
1. R = (00)*
Example 6:
Write the regular expression for the language having a string which should have atleast
one 0 and alteast one 1.
Solution:
Example 7:
Describe the language denoted by following regular expression
Solution:
The language can be predicted from the regular expression by finding the meaning of it.
We will first split the regular expression as:
L = {The language consists of the string in which a's appear triples, there is no restriction
on the number of b's}
Example 8:
Write the regular expression for the language L over ∑ = {0, 1} such that all the string do
not contain the substring 01.
Solution:
1. R = (1* 0*)
Example 9:
Write the regular expression for the language containing the string over {0, 1} in which
there are at least two occurrences of 1's between any two occurrences of 1's between any
two occurrences of 0's.
Solution: At least two 1's between two occurrences of 0's can be denoted by (0111*0)*.
Similarly, if there is no occurrence of 0's, then any number of 1's are also allowed. Hence
the r.e. for required language is:
1. R = (1 + (0111*0))*
Example 10:
Write the regular expression for the language containing the string in which every 0 is
immediately followed by 11.
Solution:
1. R = (011 + 1)*
Conversion of RE to FA
To convert the RE to FA, we are going to use a method called the subset method. This
method is used to obtain FA from the given regular expression. This method is given
below:
Step 1: Design a transition diagram for given regular expression, using NFA with ε moves.
Solution: First we will construct the transition diagram for a given regular expression.
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Now we have got NFA without ε. Now we will convert it into required DFA for that, we will
first write a transition table for this NFA.
State 0 1
q1 Qf Φ
q2 Φ q3
q3 q3 Qf
*qf Φ Φ
State 0 1
[q1] [qf] Φ
[q2] Φ [q3]
*[qf] Φ Φ
Example 2:
Design a NFA from given regular expression 1 (1* 01* 01*)*.
Step 1:
Step 2:
Step 3:
Example 3:
Construct the FA for regular expression 0*1 + 10.
Solution:
Step 1:
Step 2:
Step 3:
Step 4:
Moore Machine
Moore machine is a finite state machine in which the next state is decided by the current
state and current input symbol. The output symbol at a given time depends only on the
present state of the machine. Moore machine can be described by 6 tuples (Q, q0, ∑, O,
δ, λ) where,
Input: 010
Output: 1110(1 for q0, 1 for q1, again 1 for q1, 0 for q2)
Example 2:
Design a Moore machine to generate 1's complement of a given binary number.
Solution: To generate 1's complement of a given binary number the simple logic is that
if the input is 0 then the output will be 1 and if the input is 1 then the output will be 0.
That means there are three states. One state is start state. The second state is for taking
0's as input and produces output as 1. The third state is for taking 1's as input and
producing output as 0.
Input 1 0 1 1
State q0 q2 q1 q2 q2
Output 0 0 1 0 0
Thus we get 00100 as 1's complement of 1011, we can neglect the initial 0 and the output
which we get is 0100 which is 1's complement of 1011. The transaction table is as follows:
Thus Moore machine M = (Q, q0, ∑, O, δ, λ); where Q = {q0, q1, q2}, ∑ = {0, 1}, O = {0, 1}.
the transition table shows the δ and λ functions.
Example 3:
Design a Moore machine for a binary input sequence such that if it has a substring 101,
the machine output A, if the input has substring 110, it outputs B otherwise it outputs C.
Solution: For designing such a machine, we will check two conditions, and those are 101
and 110. If we get 101, the output will be A, and if we recognize 110, the output will be B.
For other strings, the output will be C.
Example 4:
Construct a Moore machine that determines whether an input string contains an even or
odd number of 1's. The machine should give 1 as output if an even number of 1's are in
the string and 0 otherwise.
Solution:
Example 5:
Design a Moore machine with the input alphabet {0, 1} and output alphabet {Y, N} which
produces Y as output if input sequence contains 1010 as a substring otherwise, it produces
N as output.
Solution:
Mealy Machine
A Mealy machine is a machine in which output symbol depends upon the present input
symbol and present state of the machine. In the Mealy machine, the output is represented
with each input symbol for each state separated by /. The Mealy machine can be described
by 6 tuples (Q, q0, ∑, O, δ, λ') where
Example 1:
Design a Mealy machine for a binary input sequence such that if it has a substring 101,
the machine output A, if the input has substring 110, it outputs B otherwise it outputs C.
Solution: For designing such a machine, we will check two conditions, and those are 101
and 110. If we get 101, the output will be A. If we recognize 110, the output will be B. For
other strings the output will be C.
Now we will insert the possibilities of 0's and 1's for each state. Thus the Mealy machine
becomes:
Example 2:
Design a mealy machine that scans sequence of input of 0 and 1 and generates output
'A' if the input string terminates in 00, output 'B' if the string terminates in 11, and output
'C' otherwise.
The following steps are used for converting Mealy machine to the Moore machine:
Step 1: For each state(Qi), calculate the number of different outputs that are available in
the transition table of the Mealy machine.
Step 2: Copy state Qi, if all the outputs of Qi are the same. Break qi into n states as Qin,
if it has n distinct outputs where n = 0, 1, 2..
Step 3: If the output of initial state is 0, insert a new initial state at the starting which gives
1 output.
Example 1: Convert the following Mealy machine into equivalent Moore machine.
Solution:
o For state q1, there is only one incident edge with output 0. So, we don't need to
split this state in Moore machine.
o For state q2, there is 2 incident edge with output 0 and 1. So, we will split this state
into two states q20( state with output 0) and q21(with output 1).
o For state q3, there is 2 incident edge with output 0 and 1. So, we will split this state
into two states q30( state with output 0) and q31( state with output 1).
o For state q4, there is only one incident edge with output 0. So, we don't need to
split this state in Moore machine.
Example 2:
Convert the following Mealy machine into equivalent Moore machine.
Solution:
Transition table for above Mealy machine is as follows:
The state q1 has only one output. The state q2 and q3 have both output 0 and 1. So we
will create two states for these states. For q2, two states will be q20(with output 0) and
q21(with output 1). Similarly, for q3 two states will be q30(with output 0) and q31(with
output 1).
We cannot directly convert Moore machine to its equivalent Mealy machine because the
length of the Moore machine is one longer than the Mealy machine for the given input.
To convert Moore machine to Mealy machine, state output symbols are distributed into
input symbol paths. We are going to use the following method to convert the Moore
machine to Mealy machine.
Example 1:
Convert the following Moore machine into its equivalent Mealy machine.
Solution:
q0 q0 q1 0
q1 q0 q1 1
Hence the transition table for the Mealy machine can be drawn as follows:
The equivalent Mealy machine will be,
Note: The length of output sequence is 'n+1' in Moore machine and is 'n' in the Mealy
machine.
Example 2:
Convert the given Moore machine into its equivalent Mealy machine.
Solution:
Q A B Output(λ)
q0 q1 q0 0
q1 q1 q2 0
q2 q1 q0 1
Hence the transition table for the Mealy machine can be drawn as follows:
The equivalent Mealy machine will be,
Example 3:
Convert the given Moore machine into its equivalent Mealy machine.
Q B Output(λ)
a
q0 q0 q1 0
q1 q2 q0 1
q2 q1 q2 2
Solution:
The transaction diagram for the given problem can be drawn as:
Hence the transition table for the Mealy machine can be drawn as follows:
The equivalent Mealy machine will be,
• |y| > 0
• |xy| ≤ c
• For all k ≥ 0, the string xykz is also in L.
Type - 3 Grammar
Type-3 grammars generate regular languages. Type-3 grammars must have a single
non-terminal on the left-hand side and a right-hand side consisting of a single terminal
or single terminal followed by a single non-terminal.
The productions must be in the form X → a or X → aY
where X, Y ∈ N (Non terminal)
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.
Example
X → ε
X → a | aY
Y → b
Type - 2 Grammar
Type-2 grammars generate context-free languages.
The productions must be in the form A → γ
where A ∈ N (Non terminal)
and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
These languages generated by these grammars are be recognized by a non-
deterministic pushdown automaton.
Example
S → X a
X → a
X → aX
X → abc
X → ε
Type - 1 Grammar
Type-1 grammars generate context-sensitive languages. The productions must be in
the form
αAβ→αγβ
where A ∈ N (Non-terminal)
and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The
languages generated by these grammars are recognized by a linear bounded
automaton.
Example
AB → AbBc
A → bcA
B → b
Type - 0 Grammar
Type-0 grammars generate recursively enumerable languages. The productions have
no restrictions. They are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and
nonterminals with at least one non-terminal and α cannot be null. β is a string of
terminals and non-terminals.
Example
S → ACaB
Bc → acB
CB → DB
aD → Db
Unit - 2
Context free grammar
Context free grammar is a formal grammar which is used to generate all possible strings
in a given formal language.
1. G= (V, T, P, S)
Where,
In CFG, the start symbol is used to derive the string. You can derive the string by
repeatedly replacing a non-terminal by the right hand side of the production, until all
non-terminal have been replaced by terminal symbols.
Example:
Production rules:
1. S → aSa
2. S → bSb
3. S → c
Now check that abbcbba string can be derived from the given CFG.
1. S ⇒ aSa
2. S ⇒ abSba
3. S ⇒ abbSbba
4. S ⇒ abbcbba
By applying the production S → aSa, S → bSb recursively and finally applying the
production S → c, we get the string abbcbba.
Capabilities of CFG
There are the various capabilities of CFG:
Derivation
Derivation is a sequence of production rules. It is used to get the input string through
these production rules. During parsing we have to take two decisions. These are as follows:
We have two options to decide which non-terminal to be replaced with production rule.
Left-most Derivation
In the left most derivation, the input is scanned and replaced with the production rule
from left to right. So in left most derivatives we read the input string from left to right.
Example:
Production rules:
1. S = S + S
2. S = S - S
3. S = a | b |c
Input:
a - b + c
1. S = S + S
2. S = S - S + S
3. S = a - S + S
4. S = a - b + S
5. S = a - b + c
Right-most Derivation
In the right most derivation, the input is scanned and replaced with the production rule
from right to left. So in right most derivatives we read the input string from right to left.
Example:
1. S = S + S
2. S = S - S
3. S = a | b |c
Input:
a - b + c
1. S = S - S
2. S = S - S + S
3. S = S - S + c
4. S = S - b + c
5. S = a - b + c
Parse tree
o Parse tree is the graphical representation of symbol. The symbol can be terminal or non-
terminal.
o In parsing, the string is derived using the start symbol. The root of the parse tree is that
start symbol.
o It is the graphical representation of symbol that can be terminals or non-terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So,
the operator in the parent node has less precedence over the operator in the sub-tree.
Example:
Production rules:
1. T= T + T | T * T
2. T = a|b|c
Input:
a * b + c
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Ambiguity
A grammar is said to be ambiguous if there exists more than one leftmost derivation or
more than one rightmost derivative or more than one parse tree for the given input string.
If the grammar is not ambiguous then it is called unambiguous.
Example:
1. S = aSb | SS
2. S = ∈
For the string aabb, the above grammar generates two parse trees:
If the grammar has ambiguity then it is not good for a compiler construction. No method can
automatically detect and remove the ambiguity but you can remove ambiguity by re-writing the
whole grammar without ambiguity.
Parser
Parser is a compiler that is used to break the data into smaller elements coming from
lexical analysis phase.A parser takes input in the form of sequence of tokens and produces
output in the form of parse tree.
34.5M
775
Bottom up parsing
o Bottom up parsing is also known as shift-reduce parsing.
o Bottom up parsing is used to construct a parse tree for an input string.
o In the bottom up parsing, the parsing starts with the input symbol and construct the parse
tree up to the start symbol by tracing out the rightmost derivations of string in reverse.
Example
Production
1. E → T
2. T → T * F
3. T → id
4. F → T
5. F → id
1. Shift-Reduce Parsing
2. Operator Precedence Parsing
3. Table Driven LR Parsing
a. LR( 1 )
b. SLR( 1 )
c. CLR ( 1 )
d. LALR( 1 )
Simplification of CFG
As we have seen, various languages can efficiently be represented by a context-free
grammar. All the grammar are not always optimized that means the grammar may consist
of some extra symbols(non-terminal). Having extra symbols, unnecessary increase the
length of grammar. Simplification of grammar means reduction of grammar by removing
useless symbols. The properties of reduced grammar are given below:
1. Each variable (i.e. non-terminal) and each terminal of G appears in the derivation of some
word in L.
2. There should not be any production as X → Y where X and Y are non-terminal.
3. If ε is not in the language L then there need not to be the production X → ε.
For Example:
1. T → aaB | abA | aaT
2. A → aA
3. B → ab | b
4. C → ad
In the above example, the variable 'C' will never occur in the derivation of any string, so
the production C → ad is useless. So we will eliminate it, and the other productions are
written in such a way that variable C can never reach from the starting variable 'T'.How to
To remove this useless production A → aA, we will first find all the variables which will
never lead to a terminal string such as variable 'A'. Then we will remove all the productions
in which the variable 'B' occurs.
Elimination of ε Production
The productions of type S → ε are called ε productions. These type of productions can
only be removed from those grammars that do not generate ε.
Step 1: First find out all nullable non-terminal variable which derives ε.
Step 3: Now combine the result of step 2 with the original production and remove ε
productions.
Example:
Remove the production from the following CFG by preserving the meaning of it.
1. S → XYX
2. X → 0X | ε
3. Y → 1Y | ε
Solution:
Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To preserve
the meaning of CFG we are actually placing ε at the right-hand side whenever X and Y
have appeared.
Let us take
1. S → XYX
1. S → YX
1. S → XY
If Y = ε then
1. S → XX
1. S → X
1. S → Y
Now,
1. S → XY | YX | XX | X | Y
1. X → 0X
Similarly Y → 1Y | 1
1. S → XY | YX | XX | X | Y
2. X → 0X | 0
3. Y → 1Y | 1
Removing Unit Productions
The unit productions are the productions in which one non-terminal gives another non-
terminal. Use the following steps to remove unit production:
Step 3: Repeat step 1 and step 2 until all unit productions are removed.
For example:
1. S → 0A | 1B | C
2. A → 0S | 00
3. B → 1 | A
4. C → 01
Solution:
1. S → 0A | 1B | 01
1. B → 1 | 0S | 00
Thus finally we can write CFG without unit production as
1. S → 0A | 1B | 01
2. A → 0S | 00
3. B → 1 | 0S | 00
4. C → 01
For example:
1. G1 = {S → AB, S → c, A → a, B → b}
2. G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the grammar
G1 is in CNF.
However, the production rule of Grammar G2 does not satisfy the rules specified for CNF
as S → aZ contains terminal followed by non-terminal. So the grammar G2 is not in CNF.
1. S1 → S
Step 2: In the grammar, remove the null, unit and useless productions. You can refer to
the Simplification of CFG.
Step 3: Eliminate terminals from the RHS of the production if they exist with other non-
terminals or terminals. For example, production S → aA can be decomposed as:
1. S → RA
2. R → a
Step 4: Eliminate RHS with more than two non-terminals. For example, S → ASB can be
decomposed as:
1. S → RS
2. R → AS
Example:
Convert the given CFG to CNF. Consider the given grammar G1:
1. S → a | aA | B
2. A → aBB | ε
3. B → Aa | b
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS.
The grammar will be:
1. S1 → S
2. S → a | aA | B
3. A → aBB | ε
4. B → Aa | b
Step 2: As grammar G1 contains A → ε null production, its removal from the grammar
yields:
1. S1 → S
2. S → a | aA | B
3. A → aBB
4. B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar yields:
1. S0 → a | aA | Aa | b
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a
exists on RHS with non-terminals. So we will replace terminal a with X:
1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → XBB
4. B → AX | b | a
5. X → a
Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it
from grammar yield:
1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → RB
4. B → AX | b | a
5. X → a
6. R → XB
For example:
1. G1 = {S → AB, S → c, A → a, B → b}
2. G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the grammar
G1 is in CNF. However, the production rule of Grammar G2 does not satisfy the rules
specified for CNF as S → aZ contains terminal followed by non-terminal. So the grammar
G2 is not in CNF.
1. S1 → S
Step 2: In the grammar, remove the null, unit and useless productions. You can refer to
the Simplification of CFG.
Step 3: Eliminate terminals from the RHS of the production if they exist with other non-
terminals or terminals. For example, production S → aA can be decomposed as:
1. S → RA
2. R → a
Step 4: Eliminate RHS with more than two non-terminals. For example, S → ASB can be
decomposed as:
1. S → RS
2. R → AS
Example:
Convert the given CFG to CNF. Consider the given grammar G1:
1. S → a | aA | B
2. A → aBB | ε
3. B → Aa | b
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS.
The grammar will be:
1. S1 → S
2. S → a | aA | B
3. A → aBB | ε
4. B → Aa | b
Step 2: As grammar G1 contains A → ε null production, its removal from the grammar
yields:
1. S1 → S
2. S → a | aA | B
3. A → aBB
4. B → Aa | b | a
1. S1 → S
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar yields:
1. S0 → a | aA | Aa | b
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a
exists on RHS with non-terminals. So we will replace terminal a with X:
1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → XBB
4. B → AX | b | a
5. X → a
Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it
from grammar yield:
1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → RB
4. B → AX | b | a
5. X → a
6. R → XB
For example:
The production rules of Grammar G1 satisfy the rules specified for GNF, so the grammar
G1 is in GNF. However, the production rule of Grammar G2 does not satisfy the rules
specified for GNF as A → ε and B → ε contains ε(only start symbol can generate ε). So the
grammar G2 is not in GNF.
Steps for converting CFG into GNF
Step 1: Convert the grammar into CNF.
If the given grammar is not in CNF, convert it into CNF. You can refer the following topic
to convert the CFG into CNF: Chomsky normal form
If the context free grammar contains left recursion, eliminate it. You can refer the following
topic to eliminate left recursion: Left Recursion
Step 3: In the grammar, convert the given production rule into GNF form.
If any production rule in the grammar is not in GNF form, convert it.
Example:
1. S → XB | AA
2. A → a | SA
3. B → b
4. X → a
Solution:
As the given grammar G is already in CNF and there is no left recursion, so we can skip
step 1 and step 2 and directly go to step 3.
1. S → XB | AA
2. A → a | XBA | AAA
3. B → b
4. X → a
1. S → aB | AA
2. A → a | aBA | AAA
3. B → b
4. X → a
1. S → aB | AA
2. A → aC | aBAC
3. C → AAC | ε
4. B → b
5. X → a
1. S → aB | AA
2. A → aC | aBAC | a | aBA
3. C → AAC | AA
4. B → b
5. X → a
• Union
• Concatenation
• Kleene Star operation
Union
Let L1 and L2 be two context free languages. Then L1 ∪ L2 is also context free.
Example
Let L1 = { anbn , n > 0}. Corresponding grammar G1 will have P: S1 → aAb|ab
Let L2 = { cmdm , m ≥ 0}. Corresponding grammar G2 will have P: S2 → cBb| ε
Union of L1 and L2, L = L1 ∪ L2 = { anbn } ∪ { cmdm }
The corresponding grammar G will have the additional production S → S1 | S2
Concatenation
If L1 and L2 are context free languages, then L1L2 is also context free.
Example
Union of the languages L1 and L2, L = L1L2 = { anbncmdm }
The corresponding grammar G will have the additional production S → S1 S2
Kleene Star
If L is a context free language, then L* is also context free.
Example
Let L = { anbn , n ≥ 0}. Corresponding grammar G will have P: S → aAb| ε
Kleene Star L1 = { anbn }*
The corresponding grammar G1 will have additional productions S1 → SS1 | ε
Context-free languages are not closed under −
• Intersection − If L1 and L2 are context free languages, then L1 ∩ L2 is not
necessarily context free.
• Intersection with Regular Language − If L1 is a regular language and L2 is a
context free language, then L1 ∩ L2 is a context free language.
• Complement − If L1 is a context free language, then L1’ may not be context free.
Pushdown Automata(PDA)
o Pushdown automata is a way to implement a CFG in the same way we design DFA for a
regular grammar. A DFA can remember a finite amount of information, but a PDA can
remember an infinite amount of information.
o Pushdown automata is simply an NFA augmented with an "external stack memory". The
addition of stack is used to provide a last-in-first-out memory management capability to
Pushdown automata. Pushdown automata can store an unbounded amount of
information on the stack. It can access a limited amount of information on the stack. A
PDA can push an element onto the top of the stack and pop off an element from the top
of the stack. To read an element into the stack, the top elements must be popped off and
are lost.
o A PDA is more powerful than FA. Any language which can be acceptable by FA can also
be acceptable by PDA. PDA also accepts a class of language which even cannot be
accepted by FA. Thus PDA is much more superior to FA.
PDA Components:
Input tape: The input tape is divided in many cells or symbols. The input head is read-
only and may only move from left to right, one symbol at a time.
Finite control: The finite control has some pointer which points the current symbol which
is to be read.
Stack: The stack is a structure in which we can push and remove the items from one end
only. It has an infinite size. In PDA, the stack is used to store the items temporarily.
Γ: a stack symbol which can be pushed and popped from the stack
δ: mapping function which is used for moving from current state to next state.
Turnstile Notation:
⊢ sign describes the turnstile notation and represents one move.
For example,
(p, b, T) ⊢ (q, w, α)
In the above example, while taking a transition from state p to q, the input symbol 'b' is
consumed, and the top of the stack 'T' is represented by a new string α.
Example 1:
Design a PDA for accepting a language {anb2n | n>=1}.
Now when we read b, we will change the state from q0 to q1 and start popping
corresponding 'a'. Hence,
1. δ(q0, b, a) = (q1, ε)
Thus this process of popping 'b' will be repeated unless all the symbols are read. Note
that popping action occurs in state q1 only.
1. δ(q1, b, a) = (q1, ε)
After reading all b's, all the corresponding a's should get popped. Hence when we read ε
as input symbol then there should be nothing in the stack. Hence the move will be:
1. δ(q1, ε, Z) = (q2, ε)
Where
PDA = ({q0, q1, q2}, {a, b}, {a, Z}, δ, q0, Z, {q2})
Now we will simulate this PDA for the input string "aaabbbbbb".
Solution: In this PDA, n number of 0's are followed by any number of 1's followed n
number of 0's. Hence the logic for design of such PDA will be as follows:
Push all 0's onto the stack on encountering first 0's. Then if we read 1, just do nothing.
Then read 0, and on each read of 0, pop one 0 from the stack.
For instance:
This scenario can be written in the ID form as:
Now we will simulate this PDA for the input string "0011100".
PDA Acceptance
A language can be accepted by Pushdown automata using two approaches:
1. Acceptance by Final State: The PDA is said to accept its input by the final state if it
enters any final state in zero or more moves after reading the entire input.
Let P =(Q, ∑, Γ, δ, q0, Z, F) be a PDA. The language acceptable by the final state can be
defined as:
2. Acceptance by Empty Stack: On reading the input string from the initial configuration
for some PDA, the stack of PDA gets empty.
Let P =(Q, ∑, Γ, δ, q0, Z, F) be a PDA. The language acceptable by empty stack can be
defined as:
Example:
Construct a PDA that accepts the language L over {0, 1} by empty stack which accepts all
the string of 0's and 1's in which a number of 0's are twice of number of 1's.
Solution:
We are going to design the first part i.e. 1 comes before 0's. The logic is that read single
1 and push two 1's onto the stack. Thereafter on reading two 0's, POP two 1's from the
stack. The δ can be
Now, consider the second part i.e. if 0 comes before 1's. The logic is that read first 0, push
it onto the stack and change state from q0 to q1. [Note that state q1 indicates that first 0
is read and still second 0 has yet to read].
Being in q1, if 1 is encountered then POP 0. Being in q1, if 0 is read then simply read that
second 0 and move ahead. The δ will be:
Example:
Design PDA for Palindrome strips.
Solution:
Suppose the language consists of string L = {aba, aa, bb, bab, bbabb, aabaa, ......]. The
string can be odd palindrome or even palindrome. The logic for constructing PDA is that
we will push a symbol onto the stack till half of the string then we will read each symbol
and then perform the pop operation. We will compare to see whether the symbol which
is popped is similar to the symbol which is read. Whether we reach to end of the input,
we expect the stack to be empty.
This PDA is a non-deterministic PDA because finding the mid for the given string and
reading the string from left and matching it with from right (reverse) direction leads to
non-deterministic moves. Here is the ID.
Simulation of abaaba
Step 3: The initial symbol of CFG will be the initial symbol in the PDA.
1. δ(q, ε, A) = (q, α)
1. S → 0S1 | A
2. A → 1A0 | S | ε
Solution:
1. S → 0S1 | 1S0 | ε
1. S → 0SX | 1SY | ε
2. X → 1
3. Y → 0
Example 2:
Construct PDA for the given CFG, and test whether 0104 is acceptable by this PDA.
1. S → 0BB
2. B → 0S | 1S | 0
Solution:
Example 3:
Draw a PDA for the CFG given below:
1. S → aSb
2. S → a | b | ε
Solution:
LEX
o Lex is a program that generates lexical analyzer. It is used with YACC parser generator.
o The lexical analyzer is a program that transforms an input stream into a sequence of
tokens.
o It reads the input stream and produces the source code as output through implementing
the lexical analyzer in the C program.
1. { definitions }
2. %%
3. { rules }
4. %%
5. { user subroutines }
Where pi describes the regular expression and action1 describes the actions what action
the lexical analyzer should take when pattern pi matches a lexeme.
User subroutines are auxiliary procedures needed by the actions. The subroutine can be
loaded with the lexical analyzer and compiled separately.
YACC
o YACC stands for Yet Another Compiler Compiler.
o YACC provides a tool to produce a parser for a given grammar.
o YACC is a program designed to compile a LALR (1) grammar.
o It is used to produce the source code of the syntactic analyzer of the language
produced by LALR (1) grammar.
o The input of YACC is the rule or grammar and the output is a C program.
C Compiler
Executable file that will parse grammar given in gram.Y