Chapter 4 - Syntax Analysis
Chapter 4 - Syntax Analysis
Outline
Role of parser Context free grammars Top down parsing Bottom up parsing Parser generators
token
Parser
getNext Token
Parse tree
Symbol table
Uses of grammars
E -> E + T | T T -> T * F | F F -> (E) | id
Error handling
Common programming errors
Lexical errors Syntactic errors Semantic errors Lexical errors
Error-recover strategies
Panic mode recovery Discard input symbol one at a time until one of designated set of
parser to continue
Error productions Augment the grammar with productions that generate the erroneous
constructs
Global correction Choosing minimal sequence of changes to obtain a globally least-cost
correction
expression -> expression + term expression -> expression term expression -> term term -> term * factor term -> term / factor term -> factor factor -> (expression) factor -> id
Derivations
Productions are treated as rewriting rules to generate a string Rightmost and leftmost derivations
Parse trees
-(id+id)
Ambiguity
For some strings there exist more than one parse tree Or more than one leftmost derivation Or more than one rightmost derivation Example: id+id*id
Elimination of ambiguity
Idea:
matched
A grammar is left recursive if it has a non-terminal A such that there is a derivation A=> A
Top down parsing methods cant handle left-recursive grammars A simple rule for direct left recursion elimination:
For (each j from 1 to i-1) { Replace each production of the form Ai-> Aj by the production Ai -> 1 | 2 | |k where Aj-> 1 | 2 | |k are all current Aj productions } Eliminate left recursion among the Ai-productions }
Example
Ques1: Consider the following grammar S->aBDh B->Bb|c B->cC Ques1: Consider the following grammar
D->EF
C->bC|
E->g| F->f|
D-> dD|eD|
Left factoring
Left factoring is a grammar transformation that is useful for producing
a grammar suitable for predictive or top-down parsing. Consider following grammar: Stmt -> if expr then stmt else stmt | if expr then stmt On seeing input if it is not clear for the parser which production to use We can easily perform left factoring: If we have A->1 | 2 then we replace it with
A -> A A -> 1 | 2
Algorithm
or more of its alternatives. If <> , then replace all of Aproductions A->1 |2 | | n | by A -> A | A -> 1 |2 | | n
Example:
S -> I E t S | i E t S e S | a E -> b
Types of Parser
Parser
Bottom Up Parsers Top Down Parsers
LR parser
SLR parser
CLR parser
LALR parser
Predictive Parser
LL(k) Parser
Example
S->cAd A->ab | a Input: cad
S c A d c a
S A b d c
S A a d
void A() { choose an A-production, A->X1X2..Xk for (i=1 to k) { if (Xi is a nonterminal call procedure Xi(); else if (Xi equals the current input symbol a) advance the input to the next symbol; else /* an error has occurred */ } }
lm
E T E
lm
E T E T
lm
E T E T
lm
E T E
lm
E T E
F id
F
id
F
id
T + T E
LL(1) Grammars
Predictive parsers are those recursive descent parsers needing no
backtracking Grammars for which we can create predictive parsers are called LL(1)
The first L means scanning input from left to right The second L means leftmost derivation And 1 stands for using one input symbol for lookahead
Follow(A).
$ET $ETF
$ETid $ET $E $ET+
id + id * id $ id + id * id $
id + id * id $ + id * id $ + id * id $ + id * id $
E TE T FT
F id match id T E +TE
First() are disjoint sets then we can select appropriate Aproduction by looking at the next input. Follow(A), for any nonterminal A, is set of terminals a that can appear immediately after A in some sentential form
* If we have S => Aa for some and then a is in Follow(A)
then $ is in Follow(A)
Computing First
To compute First(X) for all grammar symbols X, apply
following rules until no more terminals or can be added to any First set:
1. 2.
3.
If X is a terminal then First(X) = {X}. *>Y1Y2Yk is a production for some If X is a nonterminal and Xk>=1, then place a in First(X) if for some i a is in First(Yi) and is in all of First(Y1),,First(Yi-1) that is Y1Yi-1 => . if is in First(Yj) for j=1,,k then add to First(X). If X-* > is a production then add to First(X)
Example!
Computing follow
To compute First(A) for all nonterminals A, apply following rules until
nothing can be added to any follow set: 1. Place $ in Follow(S) where S is the start symbol 2. If there is a production A-> B then everything in First() except is in Follow(B). 3. If there is a production A->B or a production A->B where First() contains , then everything in Follow(A) is in Follow(B)
Example!
Examples
S-> AaA|BbB A->bB B-> Computing the first 1. A->bB First (A)={b} 2. S->AaA First(S)=First(A aA)={b} 3.S->BbB First(S)=First(B bB) =First(B) {} U First(bB) =First(B)-{} U {b} = {}-{} U {b}= {b} =>{b}
Computing Follow Follow (S) = {$} 1. S->AaA Follow (A)= {First(a,A)} = {a} S-> AaA Follow (A) = {Follow(S)} 2. S-> BbB Follow (B)= {b} S->BbB Follow (B)={Follow(S)} 3. A->bB Follow (B)= {Follow(A)}
Computing Follow Follow (S) = {$} Follow (A) = {a} Follow (A) = {Follow(S)} Follow (B)= {b} Follow (B)={Follow(S)} Follow (B)= {Follow(A)} Follow (S)={$} Follow (A)={$,a} Follow (B)={$,a,b}
For each terminal a in First() add A-> in M[A,a] If is in First(), then for each terminal b in Follow(A) add A-> to M[A,b]. If is in First() and $ is in Follow(A), add A-> to M[A,$] as well
M[A,a] to error
Building a parser
Original grammar:
This grammar is left-recursive, ambiguous and requires left-factoring. It needs to be modified before we build a predictive parser for it:
Remove left recursion: Remove ambiguity:
FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E') = {+, } FIRST(T') = {*, } FOLLOW(E) = FOLLOW(E') = {$, )} FOLLOW(T) = FOLLOW(T') = {+, $, )} FOLLOW(F) = {*, +, $, )} Now, we can either build a table or design a recursive descend parser.
Parsing table
E E' E'+TE' T T' T' F match + * ( ) id $ + * ( ETE' TFT' F(E) match ) E' id ETE' TFT' Fid $ E'
T'*FT'
match
T'
T'
match
match
accept
Parsing table
Step Stack 1 $E 2 $E'T 3 $E'T'F 4 $E'T'id 5 $E'T' 6 $T'F* 7 $T'F 8 $T'id 9 $T' 10 $
Parse the input id*id using the parse table and a stack Input Next Action id*id$ ETE' id*id$ TFT' id*id$ Fid id*id$ match id *id$ T'*FT' *id$ match * id$ Fid id$ match id $ T' $ accept
Eprime() { if (token == '+') then token=get_next_token() if (T()) then return Eprime() else return false else if (token==')' or token=='$') then return true else return false }
Example
E -> TE E -> +TE | T -> FT T -> *FT | F -> (E) | id
Non terminal
First
Follow
{+, *, ), $} {+, ), $} {), $} {), $} {+, ), $}
F T E E T
id +
E -> +TE
Input Symbol ( *
E -> TE
)
E ->
$
E ->
E E T T F
E -> TE
Another example
S -> iEtSS | a S -> eS | E -> b
Non terminal
a
S -> a
Input Symbol i e
S -> iEtSS S -> S -> eS
$
S ->
S S E
E -> b
stack
X Y Z $
output
Parsing Table M
Example
Non terminal E E T T F
id E -> TE
$ synch
E -> E ->
T -> FT synch
T ->
Stack
E$ E$ TE$ FTE$ idTE$ TE$ *FTE$ FTE$ TE$
Input
)id*+id$ id*+id$ id*+id$ id*+id$ id*+id$ *+id$ *+id$ +id$ +id$
Action
Error, Skip ) id is in First(E)
Introduction
Constructs parse tree for an input string beginning at the leaves (the bottom) and working towards the root (the top) Example: id*id
id*id
F * id
id
T * id
F id
T*F
F id id
F T*F F id id
E F T*F F id id
Shift-reduce parser
The general idea is to shift some symbols of input to the stack until a reduction can be applied At each reduction step, a specific substring matching the body of a production is replaced by the nonterminal at the head of the production The key decisions during bottom-up parsing are about when to reduce and about what production to apply A reduction is a reverse of a step in a derivation The goal of a bottom-up parser is to construct a derivation in reverse:
E=>T=>T*F=>T*id=>F*id=>id*id
Handle pruning
A Handle is a substring that matches the body of a production and whose reduction represents one step along the reverse of a rightmost derivation
Handle id F id T*F
Reducing production
A stack is used to hold grammar symbols Handle always appear on top of the stack Initial configuration:
Stack $
Input w$ Input $
Acceptance configuration
Stack $S
Basic operations:
Example: id*id
Action shift reduce by F->id reduce by T->F shift shift reduce by F->id reduce by T->T*F reduce by E->T accept
Input xyz$ z$
$ $Bxy
Example:
Input
else $
Reduce/reduce conflict
stmt -> id(parameter_list) stmt -> expr:=expr parameter_list->parameter_list, parameter parameter_list->parameter parameter->id expr->id(expr_list) expr->id expr_list->expr_list, expr Stack expr_list->expr id(id
Input
,id) $
LR Parsing
The most prevalent type of bottom-up parsers
LR(k), mostly interested on parsers with k<=1 Why LR parsers? Table driven Can be constructed to recognize all programming language constructs Most general non-backtracking shift-reduce parsing method Can detect a syntactic error as soon as it is possible to do so Class of grammars for which we can construct LR parsers are superset of those which we can construct LL parsers
States of an LR parser
States represent set of items An LR(0) item of G is a production of G with the dot at some position of the body:
A->.XYZ A->X.YZ A->XY.Z A->XYZ. In a state having A->.XYZ we hope to see a string derivable from XYZ next on the input. What about A->X.YZ?
the following rules: Add every item in I to closure(I) If A->.B is in closure(I) and B-> is a production then add the item B->. to clsoure(I).
Example:
Goto (I,X) where I is an item set and X is a grammar symbol is closure of set of all items [A->
X. ] where [A-> .X ] is in I
I1
Example
E T (
E->E. E->E.+T
I2 E->T. T->T.*F
I4 F->(.E) E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id
Closure algorithm
SetOfItems CLOSURE(I) { J=I; repeat for (each item A-> .B in J) for (each prodcution B-> of G) if (B->. is not in J) add B->. to J; until no more items are added to J on one round; return J;
GOTO algorithm
SetOfItems GOTO(I,X) { J=empty; if (A-> .X is in I) add CLOSURE(A-> X. ) to J; return J;
Example
I0=closure({[E->.E]} E->.E E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id
acc $ E T id (
I5=goto(I0,id) F->id.
I4 F->(.E) E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id
+ * id
I9
+ E
I8 E->E.+T F->(E.)
I11 F->(E).
I3 T>F.
Example: id*id
Line Stack
Symbols
Input
Action
(1)
(2) (3) (4) (5)
0
05 03 02 027
$
$id $F $T $T*
id*id$
*id$ *id$ *id$ id$
Shift to 5
Reduce by F->id Reduce by T->F Shift to 7 Shift to 5
(6)
(7) (8) (9)
0275
02710 02 01
$T*id
$T*F $T $E
$
$ $ $
Reduce by F->id
Reduce by T->T*F Reduce by E->T accept
LR-Parsing model
INPUT
a1 ai an $
Sm Sm-1 $
LR Parsing Program
Output
ACTION
GOTO
LR parsing algorithm
let a be the first symbol of w$; while(1) { /*repeat forever */ let s be the state on top of the stack; if (ACTION[s,a] = shift t) { push t onto the stack; let a be the next input symbol; } else if (ACTION[s,a] = reduce A->) { pop || symbols of the stack; let state t now be on top of the stack; push GOTO[t,A] onto the stack; output the production A->; } else if (ACTION[s,a]=accept) break; /* parsing is done */ else call error-recovery routine; }
If [A->.a] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to shift j If [A->.] is in Ii, then set ACTION[i,a] to reduce A-> for all a in follow(A) If {S->.S] is in Ii, then set ACTION[I,$] to Accept
SLR(1). If GOTO(Ii,A) = Ij then GOTO[i,A]=j All entries not defined by above rules are made error The initial state of the parser is the one constructed from the set of items containing [S->.S]
Example
STATE id 0 1 2 S5 S6 R2 S7 R2 + * ACTON ( S4 Acc R2 ) $ E 1 GOTO T 2 F 3
3
4 5 6 7 8 9 10 11 S5 S5 S5
R 4
R7
S4
R4
R4
8 2 3
R 6
R 6 S4 S4
R6
R6 9 3 10
S6 R1 R3 R5 S7 R3 R5
S11 R1 R3 R5 R1 R3 R5
(0) E->E (1) E -> E + T (2) E-> T (3) T -> T * F (4) T-> F (5) F -> (E) (6) F->id
Stac k 0 05 03
Symbol s
Input id*id+id$
id F
*id+id$ *id+id$
id*id+id?
(4)
(5) (6) (7) (8) (9) (10) (11) (12) (13) (14)
02
027 0275 02710 02 01 016 0165 0163 0169 01
T
T* T*id T*F T E E+ E+id E+F E+T` E
*id+id$
id+id$ +id$ +id$ +id$ +id$ id$ $ $ $ $
Shift to 7
Shift to 5 Reduce by F->id Reduce by T>T*F Reduce by E->T Shift Shift Reduce by F->id Reduce by T->F Reduce by E>E+T accept
I3 S ->R.
I4 L->*.R R->.L L->.*R L->.id Action =
I5 L -> id.
I6 S->L=.R R->.L L->.*R L->.id
I7 L -> *R.
I8 R -> L. I9 S -> L=R.
Use lookahead symbols for items: LR(1) items Results in a large collection of items
=
Example:
*
rm
Example
S->S S->CC C->cC C->d
If [A->.a, b] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to shift j If [A->., a] is in Ii, then set ACTION[i,a] to reduce A-> If {S->.S,$] is in Ii, then set ACTION[I,$] to Accept
LR(1). If GOTO(Ii,A) = Ij then GOTO[i,A]=j All entries not defined by above rules are made error The initial state of the parser is the one constructed from the set of items containing [S->.S,$]
Example
S->S S->CC C->cC C->d
I4 C->d. , c/d
I7 C->d. , $
different states Make lists of (terminal-symbol, action) for each state Implement Goto table by having a link list for each nonterinal in the form (current state, next state)
GO TO E 1
4
5
S3
S3 S4 R1 R2 R3 S5 S5 R2 R3
S2
S2
7
8
I2: E->(.E) E->.E+E E->.E*E E->.(E) E->.id I5: E->E*.E E->(.E) E->.E+E E->.E*E E->.(E) E->.id
6 7 8 9
R1 R2 R3
R1 R2 R3
Readings