Lecture 5
Lecture 5
Implementing Lexical
Specifications
ε a ε b ε
B C E G H I
ε ε
A ε J
ε ε ε
c
D F K
ε
ε
NFA to DFA
((ab)*|c)*
b
a EG HIJKABCD
ABCDIJK c
FJKABCDI
NFA to DFA
((ab)*|c)*
a
b
a EG HIJKABCD
a
ABCDIJK c
c
FJKABCDI
c
NFA to DFA
a
B
a a
b a
a b
A C D
a
Implementing DFAs
● Use the state transitions to construct a lookup table
○ Rows can be states and columns can be input alphabets
○ Initialize all cells to empty state (Φ)
○ For each of the alphabets, fill the cell with the state that the “current” state
(represented by the row) can transition to.
○ Based on the “current” state, determine the next state
1
0 1
s0
1 s1 s2 s0 s1 s2
0 s1 s2 s0
0 s2 Φ Φ
Distinguishing lexemes
Suppose we have a regex : (break | [a-z]+)
How do we distinguish break from breaker using the DFA?
b r e a k
K ε
ε
ε [a-z] ε
I
[a-z]+
Distinguishing lexemes
Suppose we have a regex : (break | [a-z]+)
How do we distinguish break from breaker using the DFA?
r e a k
I I I I KI
b
[a-qs-z] [a-df-z] [b-z] [a-jl-z]
[ac-z] [a-z]
I
[a-z]+
Lexical Errors
● Lexical analyzer generally reports lexically invalid errors:
○ Length of number, string, identifier etc.
○ Incomplete number (e.g., 1. etc.)
○ Quotes on strings, characters
○ …
Lexical Analysis
How do you represent the string of balanced parentheses using RE?