Syntax Analysis
Syntax Analysis
Symbol Table
E→E+T|E–T|T
T → T * F | T/F | F
F → ( E ) | id
Derivations
Start with the start terminal.
At each step replace a non-terminal by the body of
one of its production.
Consider the grammar
E → -E [ E → -E]
→ -(E) [ E → (E)]
→ -(id) [ E → id ]
Leftmost Derivations
Leftmost non-terminal is always chosen.
Defined as
Rightmost derivations
Rightmost non-terminal is always chosen.
Defined as
Also called canonical derivation.
Parse Tree
A graphical representation of a derivation.
Each interior node represents the application of a
production.
Interior node is labeled with the non-terminal A in the
head of the production.
The children of the node are labeled from left to right,
by the symbols in the body of the production.
Parse Tree
E
Parse tree for the derivation of
-(id + id)
- E
( E )
E + E
id id
Ambiguity
A grammar that produces more than one parse tree
for some sentence is said to ambiguous.
Consider the two leftmost derivations for the sentence
id+id*id
𝐸 𝐸
Ambiguity
E E
E + E E * E
id E * E E + E id
id id id id
CFG vs Regular Expression
CFG are more powerful than regular expressions.
Every construct that can be described by a regular
expression can be described by a grammar.
Not vice versa.
Lexical vs Syntactic Analysis
Why use both regular expression and CFG?
Separation modularizes the front end of a compiler
into two manageable-sized component.
Lexical rules are quite simple
No need of CFG.
RE provides more concise and easier-to-understand
notation for tokens than grammar.
Eliminating Ambiguity
Rewriting an ambiguous grammar can resolve
ambiguity sometimes.
Consider the grammar
stmt
E1
E2 S1 S2
Eliminating Ambiguity
Another parse tree for
stmt
E1
S2
E2 S1
Dangling else
Which parse tree should we consider as correct one?
The first parse tree is preferred in programming
language.
The rule is “Match each else with the closest
unmatched then”.
Eliminating Ambiguity
We can convert the grammar into an unambiguous
one.
Left Recursion
A grammar is left recursive if it has a non terminal A
such that there is a derivation for some string .
Immediate left recursion occurs when there’s
production
Top down parsing method can not handle it.
How do we resolve it?
Immediate Left Recursion Elimination
Any production can be replaced with
No begins with an A.
Replace the A-productions by
Immediate Left Recursion Elimination
Consider the example.
E→E+T E → TE’
E→E–T E’ → +TE’ | - TE’ |
E→T T → FT’
T→T*F T’ → *FT’ |
T → T/F F → ( E ) | id
T→F
F → ( E ) | id
Left Recursion Problem
Look at the following grammar
Input: Grammar G
Output: An equivalent left-factored grammar.
Method
For each non-terminal A, find the longest prefix common to two or more of its
alternatives. If replace all of the A-productions , where represents all alternatives that do
not begin with , by
Repeatedly apply this transformation until no two alternatives for a nonterminal have a
common prefix.
Left Factoring Example
Consider the dangling-else example