Chapter 3 - Syntax Analyzer
Chapter 3 - Syntax Analyzer
Instructor: Mohammed O.
Email: momoumer90@gmail.com
Samara University
Chapter Three
This Chapter Covers:
Syntax Analyzer
Top-Down Parsing
Predictive Parsing
Regular Expression vs Context Free Grammar
Recursive Descent Parsing
Non Recursive Predictive Parsing
Syntax Analyzer
Syntax Analyzer creates the syntactic structure of the given
source program.
This syntactic structure is mostly a parse tree.
Syntax Analyzer is also known as parser.
The syntax of a programming is described by a context-
free grammar (CFG). We will use BNF (Backus-Naur
Form) notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a context-free
grammar or not.
If it satisfies, the parser creates the parse tree of that
program.
Otherwise the parser gives the error messages.
Parser
A context-free grammar
gives a precise (accurate) syntactic specification of a
programming language.
the design of the grammar is an initial phase of the design
of a compiler.
a grammar can be directly converted into a parser by
some tools.
Parser works on a stream of tokens.
The smallest item is a token.
Parsers (cont.)
We categorize the parsers into two groups:
Top-Down Parser
the parse tree is created top to bottom, starting from the
root.
Bottom-Up Parser
the parse is created bottom to top; starting from the leaves
Both top-down and bottom-up parsers scan the input from
left to right (one symbol at a time).
Efficient top-down and bottom-up parsers can be
implemented only for sub-classes of context-free grammars.
LL for top-down parsing
LR for bottom-up parsing
Context-Free Grammars
CFG is a formal grammar which is used to generate all
possible strings in a given formal language.
In a CFG , G (where G describes the grammar) can be defined
by four tuples as: G= (V, T, P, S)
T describes a finite set of terminal symbols.
V describes a finite set of non-terminal symbols.
P describes a set of productions rules in the following form
A where A is a non-terminal and
is a string of terminals and non-terminals (including the
empty string)
S is the start symbol (one of the non-terminal symbol)
Example: E E + E | E – E | E * E | E / E | - E
E (E)
E id
Derivations
In CFG, the start symbol is used to derive the string. You can
derive the string by repeatedly replacing a non-terminal by
the right hand side of the production, until all non-terminal
have been replaced by terminal symbols.
E E+E
E+E derives from E
we can replace E by E+E
to able to do this, we have to have a production rule
EE+E in our grammar.
We will see that the top-down parsers try to find the left-
most derivation of the given source program.
We will see that the bottom-up parsers try to find the right-
most derivation of the given source program in the reverse
order.
Parse Tree
Inner nodes of a parse tree are non-terminal symbols.
The leaves of a parse tree are terminal symbols.
A parse tree can be seen as a graphical representation of a
derivation.
E
E -E E E - E
-(E) -(E+E)
- E - E ( E )
( E ) E + E
E E
-(id+E) - -(id+id)
E - E
( E )
( E )
E + E
E + E
id
id id
Ambiguity
A grammar produces more than one parse tree for a
sentence is called as an ambiguous grammar.
E E+E id+E id+E*E
id+id*E id+id*id E
E + E
id E * E
id id
E E*E E+E*E id+E*E
E
id+id*E id+id*id
E * E
E + E id
id id
Ambiguity (cont.)
For the most parsers, the grammar must be unambiguous.
unambiguous grammar
unique selection of the parse tree for a sentence
+
A A for some string
S Aa Sca or
A Sc Aac causes to a left-recursion
So, we have to eliminate all left-recursions from our
grammar
Eliminate Left-Recursion -- Example
S Aa | b
A Ac | Sd | f
- Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A Sd with A Aad | bd
So, we will have A Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A bdA’ | fA’
A’ cA’ | adA’ |
Cont.
So, the resulting equivalent grammar which is not left-
recursive is:
S Aa | b
A bdA’ | fA’
A’ cA’ | adA’ |
Left-Factoring
A predictive parser (a top-down parser without
backtracking) insists that the grammar must be left-
factored.
grammar a new equivalent grammar suitable for
predictive parsing
convert it into
A A’ | 1 | ... | m
A’ 1 | ... | n
Left-Factoring – Example1
A abB | aB | cdg | cdeB | cdfB
A aA’ | cdg | cdeB | cdfB
A’ bB | B
A aA’ | cdA’’
A’ bB | B
A’’ g | eB | fB
Left-Factoring – Example2
A ad | a | ab | abc | b
A aA’ | b
A’ d | | b | bc
A aA’ | b
A’ d | | bA’’
A’’ | c