Unit 1 - Finite Automata
Unit 1 - Finite Automata
Contents:-
Introduction- Basic Mathematical Notation and techniques- Finite State systems – Basic Definitions – Finite
Automaton – DFA & NDFA – Finite Automaton with €- moves – Regular Languages- Regular Expression –
Equivalence of NFA and DFA
Introduction:-
Automata theory is a combination of computer science and mathematics. This theory provides the fundamental
knowledge that is required to developed or design the new programming languages, for compiler construction,
searching string, pattern matching, computer security and artificial intelligence.
Theory of Computation is a combination of computer Science and mathematics. It shows how efficiently we can
solve any problem using model of computation or algorithm. Automata theory is one of the branch of theoretical
computer science. It is a study of abstract machine and computational problem. A machine takes input and after
processing produces output.
Finite State machine is a mathematical or computational model that can be used to represent and control
execution flow of that system. It can be used in many fields like mathematics, artificial intelligence, natural
language processing etc. It has a limited number of state and one state of the machine is active at a time. It
means machine moves from one state to another to perform specific task or action.
Finite state machine has a limited memory, so when any state moves from one state to another state with
specified input, it provides the output and holds it for a specified time. Finite State machine is also called as
automata theory. It is a logic of simple machine.
A real-life example of finite state machine is a traffic light. It has three state Red, Green & Yellow. It changes
the state at a specified time, Red to green, green to yellow and yellow to red. Another example of automata
theory is video game controller. When the user presses a button, the system performs the actions based on it.
Pattern matching
Compilers Construction
Alphabet (Σ):
It is finite set of symbols that are denoted by Greek letter sigma (Σ).
For example
Σ ={ a, b, c} is the set of alphabet and ‘a’ ,’b’ and ‘c’ is the symbol.
Σ = {0,1,2,3} is the set of alphabet and ‘0’ ,’1’ , ‘2’ and ‘3’ is the symbol.
String:
A string is a finite sequence of symbol from the alphabet. For example, 11001 is a string from the binary
alphabet Σ = {0,1}.
The length of a string is number of symbol in string and its length is denoted by |w|. for example, |1011| = 4 and
|Є| = 0.
Proper Prefix – a prefix of a string other than the string itself is called proper prefix of a string.
Proper Suffix – a suffix other than the string itself is called proper suffix.
Formal Language:
It is a set of finite-length of string from alphabet. In this grammar is used to generate words.
Or
Formal language L over an alphabet Σ is a subset of Σ*, that is, a set of word over that alphabet.
For example
Describe the language of all strings that start with a over alphabet Σ ={a}
L = {a,aa,aaa,aaa,…………}
Describe the language of all strings that start with 0 over alphabet Σ ={0,1}
L = {0,001,00,011,01,010…………..}
Operations on Languages:
Concatenation - If L1 and L2 are two languages then concatenation is denoted by L1L2(First string followed by
second).
L1L2 = {ab}
Σ*=Σ0 U Σ1 U Σ2.....
Σ* = Σ0U Σ1 U Σ2 …………..
Σ+ = Σ* - Σ0
Σ+ = Σ1 U Σ2.....
For example-
Σ={a} then
Σ + = Σ1 U Σ2 …………..
Repetitions
A finite automaton (FA) is a machine that recognizes pattern or language. It has a fixed memory. It accepts input
symbol and change its state based on current state. If the input string is successfully processed till the end of the
string and automata reaches till final or accepting state, input is accepted; otherwise rejected.
Determinstic Finite
Automata(DFA)
NonDeterminstic Finite
Finite Automata Automata(DFA) with ε
Moore Machine
Finite Automata with
output
Mealy Machine
Finite means the number of possible state and number of symbol in the alphabet are finite. Automaton means the
changes of state based on input.
The term “deterministic” means for each input there is one and only one transition state to which the automaton
can move from its current state.
A (deterministic) FA is 5-tuple
Terminology of DFA
Transition - The edges labelled with an input alphabet shows the transitions.
Final State - The final state is indicated by double circles. It is also called as terminal state or accepting state.
Transition Table – A table which represents the set of state in a row and the input alphabet in the columns.
Self-Loop – when any edge shows the transition to itself for any input alphabet.
In this example DFA accepts only “ab” string and rejects all other strings. It always starts with an initial state q 0.
q0 is called as initial state and takes input “a” and moves to q1. The edge or arc drawn between q0 to q1 with input a
is called transition and transition mapping δ (q0, a) = q1. After processing the first input, it again starts with q1
state and reads next symbol i.e., “b”, and moves from q1 to q2. This continues till we reach the final state.
M = {Q, Σ, δ, q0, F}
Transition Table
δ a b
q0 q1 qerror
q1 qerror q2
q2 qerror qerror
It is a tabular representation of the transition function. It has two arguments, state (Q) and a symbol (Σ ) and
output returns to the “next state”.
• The start state is marked with an arrow & final or accepted states is marked with a star (*).
Deterministic finite automata has 5 tuple (Q,Σ,δ,q0,F). It always starts from initial state and reads the symbol and
moves from one state to another as given by transition function. After reading the entire input it reaches to a
final state. The Final or accepting state either accepts or rejects the string. When the string is accepted by the
automata or machine it is said to be language recognizer by the automaton.
Formal Definition: The language accepted by FA is denoted as L(M) is the set{x| (q0, x} is in F}
The language recognized or accepted by deterministic finite automata is called regular language and regular
language is language which is accepted by finite automata.
Generate recognize
Grammar Language Automata
Example
Start q0 a q1 b q2 (Final
State) = Accept
Start q0 b qerror (Non-Final
State) = Reject
The language accepted by the finite automata is called regular language. Regular language described by
algebraic expression is called regular expression.
A regular expression is a set of string according to syntax rules. This expression is used by text editor, search
engine and for pattern matching purpose. DFA accepts the regular expression so it is also called pattern
recognizer.
For example: -
L = Construct a DFA which accept the string ending with 00 over Σ ={0, 1}.
RE = (0+1)* 00
Solution
DFA accept the language only L = {00, 100 , 10100, 0100, 000, . . . . . . }
M = (Q, Σ, δ , q0, F)
Transition Table
δ 0 1
q0 q1 q0
q1 q2 q0
q2 q2 q0
DFA NFA
DFA stands for Deterministic Finite Automata. NFA stands for Nondeterministic Finite Automata.
For each symbolic representation of the alphabet, No need to specify how does the NFA react according
there is only one state transition in DFA. to some symbol.
DFA cannot use Empty String transition. NFA can use Empty String transition.
DFA requires more space. NFA requires less space then DFA.
eg: if we give input as 0 on q0 state so we must eg: if we give input as 0 on q0 state so we can give
give 1 as input to q0 as self loop. next input 1 on q1 which will go to next state.
δ: QxΣ -> Q i.e. next possible state belongs to Q. δ: Qx(Σ U ε) -> 2^Q i.e. next possible state belongs to
DFA NFA
power set of Q.
DFA allows only one move for single input There can be choice (more than one move) for single
alphabet. input alphabet.
Definition –
The term nondeterministic refers to each input symbol that returns set of zero or more state or any finite
automata and allows zero or more transition for each input symbol. DFA is a special case of NFA.
A (Nondeterministic) FA is 5-tuple
• q0 is an initial state;
Example1
Construct NFA for the language which starts with ab over Σ = {a, b}.
Solution –
The NFA is allowed to make a transition without receiving an input symbol, i.e., transition with ε means the
empty string.
Definition –
The finite automata with ε move is the automata in which the state can be change without receiving any input
symbol.
The meaning of ε move indicate NULL transition from one state to another.
NFA with ∈ move: If any FA contains ε transaction or move, the finite automata is called NFA with ∈ move.
Examples 1
Construct NFA with ε which can accept language consisting any no. of 0’s and any no. of 1’s.
Solution –
q0 = q0 , F = q1
Transition Table
δ 0 1 ε
q0 q0 ф ф
q1 ф q1 ф
Examples 2
Construct NFA with ε which can accept language consisting any no. of 0’s followed by any no. of 1’s followed
by any no. of 2’s .
Solution –
L = {ε, 0 , 00,000, ………..1, 01, 011, 11, 111, 000111, … , 012, 2,22, 02, 00122, ……..}
q0 = q0 , F = q2
Transition Table
δ 0 1 2 ε
q0 q0 ф ф ф
q1 ф q1 ф ф
q2 ф ф q2 ф
Regular Expressions (RE): Definition & Example
The language accepted by finite automata can be easily described by simple expressions called Regular
Expressions. It is the most effective way to represent any language.
The languages accepted by some regular expression are referred to as Regular languages.
A regular expression can also be described as a sequence of pattern that defines a string.
Regular expressions are used to match character combinations in strings. String searching algorithm used this
pattern to find the operations on a string.
For instance:
In a regular expression, a* means zero or more occurrence of x. It can generate {e, a, aa, aaa, aaaa…}
In a regular expression, a+ means one or more occurrence of x. It can generate {a, aa, aaa, aaaa…}
There are various application of regular expression. They are given below.
1. Compiler uses regular expressions for representing various rules and regulation regarding tokens like
variables, constants, keywords, and various used defined entities.
2. Various operating system uses regular expressions in commands of command line interface Example wild
characters,? Used in DOS operating system Meta characters like *, ? , . , [] used in UNIX operating system.
3. Database management software uses regular expressions for finding out user defined entities from readymade
entities
4. Spread sheet software also uses regular expression for filtering records among the database
5. Some utility software also uses regular expression in there code like LEX Which accept regular expression as
input and produce equivalent C program as output.
External Clause: Nothing is a regular expression unless it is obtained from the above two clauses.
Some RE Examples
(a+b)* Set of strings of a’s and b’s of any length including the null string. So L = { ε, a, b, aa ,
ab , bb , ba, aaa…….}
(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb, aabb, babb, aaabb,
ababb , …………..}
(11)* Set consisting of even number of 1’s including empty string, So L= {ε, 11, 1111,
111111 , ……….}
(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd number of b’s , so L =
{b, aab, aabbb, aabbbbb, aaaab, aaaabbb, …………..}
(aa + ab + ba + bb)* String of a’s and b’s of even length can be obtained by concatenating any combination
of the strings aa, ab, ba and bb including null, so L = {aa, ab, ba, bb, aaab, aaba...}
∅* = ε
ε* = ε
RR* = R*R
R*R* = R*
(R*)* = R*
(PQ)*P =P(QP)*
R + R = R (Idempotent law)
The set of regular languages over an alphabet is defined recursively as below. Any language belonging to
this set is a regular language over .
Basis Clause: 𝛟, { } and {a} for any symbol a € are regular languages.
Inductive Clause: If Lr and Ls are regular languages, then Lr Ls , LrLs and Lr* are regular languages.
External Clause: Nothing is a regular language unless it is obtained from the above two clauses.
For example, let = {a, b}. Then since {a} and {b} are regular languages, {a, b} ( = {a} {b} ) and {ab}
( = {a}{b} ) are regular languages. Also since {a} is regular, {a} * is a regular language which is the set of
*
strings consisting of a's such as , a, aa, aaa, aaaa etc. Note also that , which is the set of strings
consisting of a's and b's, is a regular language because {a, b} is regular.
The regular languages are those languages that can be constructed from the three set of operations viz.,
Union
Intersaction
Concatenation
Kleene closure
Complement
Intersection : If L1 and If L2 are two regular languages, their intersection L1 ∩ L2 will also be regular. For
example,
L1= {am bn | n ≥ 0 and m ≥ 0} and L2= {a m bn ∪ bn am | n ≥ 0 and m ≥ 0}
L3 = L1 ∩ L2 = {am bn | n ≥ 0 and m ≥ 0} is also regular.
Concatenation : If L1 and If L2 are two regular languages, their concatenation L1.L2 will also be regular.
For example,
L1 = {an | n ≥ 0} and L2 = {b n | n ≥ 0}
L3 = L1.L2 = {am . bn | m ≥ 0 and n ≥ 0} is also regular.
Kleene Closure : If L1 is a regular language, its Kleene closure L1* will also be regular. For example,
L1 = (a ∪ b)
L1* = (a ∪ b)*
Complement : If L(G) is regular language, its complement L’(G) will also be regular. Complement of a
language can be found by subtracting strings which are in L(G) from all possible strings. For example,
L(G) = {an | n > 3}
L’(G) = {an | n <= 3}
Example 1
Write the regular expression for the language accepting all combinations of a's, over the set ∑ = {a}
Solution:
All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times, that means a
null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a regular expression for this as:
R = a*
Example 2
Write the regular expression for the language accepting all combinations of a's except the null string, over the
set ∑ = {a}
Solution:
This set indicates that there is no null string. So we can denote regular expression as:
R = a+
Example 3
Write the regular expression for the language accepting all the string containing any number of a's and b's.
Solution:
r.e. = (a + b)*
This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a and b.
The (a + b)* shows any combination with a and b even a null string.
Example 4
Solution:
The language can be predicted from the regular expression by finding the meaning of it. We will first split the
regular expression as:
L = {The language consists of the string in which a's appear triples, there is no restriction on the number of b's}
Example 5
Write the regular expression for the language L over ∑ = {0, 1} such that all the string do not contain the
substring 01.
Solution:
R = (1* 0*)
Example 6
Write the regular expression for the language containing the string over {0, 1} in which there are at least two
occurrences of 1's between any two occurrences of 1's between any two occurrences of 0's.
Solution: At least two 1's between two occurrences of 0's can be denoted by (0111*0)*.
Similarly, if there is no occurrence of 0's, then any number of 1's are also allowed. Hence the r.e. for required
language is:
R = (1 + (0111*0))*
Example 7
Write the regular expression for the language containing the string in which every 0 is immediately followed by
11.
Solution:
R = (011 + 1)*
Example 8
Which one of the following languages over the alphabet {0,1} is described by the regular expression?
(0+1)*0(0+1)*0(0+1)*
(A) The set of all strings containing the substring 00.
(B) The set of all strings containing at most two 0’s.
(C) The set of all strings containing at least two 0’s.
(D) The set of all strings that begin and end with either 0 or 1.
Solution : Option A says that it must have substring 00. But 10101 is also a part of language but it does not
contain 00 as substring. So it is not correct option.
Option B says that it can have maximum two 0’s but 00000 is also a part of language. So it is not correct
option.
Option C says that it must contain atleast two 0. In regular expression, two 0 are present. So this is correct
option.
Option D says that it contains all strings that begin and end with either 0 or 1. But it can generate strings
which start with 0 and end with 1 or vice versa as well. So it is not correct.
Example 9
Solution : Option (A) says that it will have 0 or more a followed by 0 or more b. But S -> bS => baS => ba is
also a part of language. So (A) is not correct.
Option (B) says that it will have equal no. of a’s and b’s. But But S -> bS => b is also a part of language. So (B)
is not correct.
Option (C) says either it will have 0 or more a’s or 0 or more b’s or a’s followed by b’s. But as shown in
option (A), ba is also part of language. So (C) is not correct.
Option (D) says it can have any number of a’s and any numbers of b’s in any order. So (D) is correct.
Example 10:
Solution : Two regular expressions are equivalent if languages generated by them are same.
Option (A) can generate all strings generated by 0*(10*)*. So they are equivalent.
Option (B) string null can not generated by given languages but 0*(10*)* can. So they are not equivalent.
Option (C) will have 10 as substring but 0*(10*)* may or may not. So they are not equivalent.
Equivalence of NFA and DFA-