0% found this document useful (0 votes)
339 views

Chapter 4 - Syntax Analysis

The document discusses syntax analysis in compiler design. It covers topics like the role of parsers, context free grammars, top-down and bottom-up parsing, parser generators, and error handling strategies. Specifically, it outlines different types of parsers like LR, SLR, LALR parsers and top-down parsers like recursive descent and LL(k) parsers. It also discusses concepts like ambiguity, left factoring, left recursion elimination, computing first and follow sets, and construction of predictive parsing tables.

Uploaded by

Vinay Dubasi
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
339 views

Chapter 4 - Syntax Analysis

The document discusses syntax analysis in compiler design. It covers topics like the role of parsers, context free grammars, top-down and bottom-up parsing, parser generators, and error handling strategies. Specifically, it outlines different types of parsers like LR, SLR, LALR parsers and top-down parsers like recursive descent and LL(k) parsers. It also discusses concepts like ambiguity, left factoring, left recursion elimination, computing first and follow sets, and construction of predictive parsing tables.

Uploaded by

Vinay Dubasi
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 82

Compiler Design

Chapter 4 Syntax Analysis

Outline

Role of parser Context free grammars Top down parsing Bottom up parsing Parser generators

The role of parser


Source Lexical program Analyzer

token

Parser
getNext Token

Parse tree

Rest of Intermediate Front End representation

Symbol table

Uses of grammars
E -> E + T | T T -> T * F | F F -> (E) | id

E -> TE E -> +TE | T -> FT T -> *FT | F -> (E) | id

Error handling
Common programming errors
Lexical errors Syntactic errors Semantic errors Lexical errors

Error handler goals


Report the presence of errors clearly and accurately Recover from each error quickly enough to detect subsequent errors Add minimal overhead to the processing of correct progrms

Error-recover strategies
Panic mode recovery Discard input symbol one at a time until one of designated set of

synchronization tokens is found


Phrase level recovery Replacing a prefix of remaining input by some string that allows the

parser to continue
Error productions Augment the grammar with productions that generate the erroneous

constructs
Global correction Choosing minimal sequence of changes to obtain a globally least-cost

correction

Context free grammars


Terminals Nonterminals Start symbol productions

expression -> expression + term expression -> expression term expression -> term term -> term * factor term -> term / factor term -> factor factor -> (expression) factor -> id

Derivations

Productions are treated as rewriting rules to generate a string Rightmost and leftmost derivations

E -> E + E | E * E | -E | (E) | id Derivations for (id+id)

E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)

Parse trees

-(id+id)

E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)

Ambiguity
For some strings there exist more than one parse tree Or more than one leftmost derivation Or more than one rightmost derivation Example: id+id*id

Elimination of ambiguity

Elimination of ambiguity (cont.)

Idea:

A statement appearing between a then and an else must be

matched

Elimination of left recursion


A grammar is left recursive if it has a non-terminal A such that there is a derivation A=> A

Top down parsing methods cant handle left-recursive grammars A simple rule for direct left recursion elimination:

For a rule like: A -> A | We may replace it with A -> A A -> A |

Left recursion elimination (cont.)


There are cases like following S -> Aa | b A -> Ac | Sd | Left recursion elimination algorithm: Arrange the nonterminals in some order A1,A2,,An. For (each i from 1 to n) {

For (each j from 1 to i-1) { Replace each production of the form Ai-> Aj by the production Ai -> 1 | 2 | |k where Aj-> 1 | 2 | |k are all current Aj productions } Eliminate left recursion among the Ai-productions }

Example
Ques1: Consider the following grammar S->aBDh B->Bb|c B->cC Ques1: Consider the following grammar

D->EF

C->bC|

E->g| F->f|

Ques2: Consider the following Grammar S-> A A->Ad|Ae|aB|aC B->bBC|f C->g

Ques2: Consider the following Grammar A-> aBD|aCD

D-> dD|eD|

Left factoring
Left factoring is a grammar transformation that is useful for producing

a grammar suitable for predictive or top-down parsing. Consider following grammar: Stmt -> if expr then stmt else stmt | if expr then stmt On seeing input if it is not clear for the parser which production to use We can easily perform left factoring: If we have A->1 | 2 then we replace it with

A -> A A -> 1 | 2

Left factoring (cont.)

Algorithm

For each non-terminal A, find the longest prefix common to two

or more of its alternatives. If <> , then replace all of Aproductions A->1 |2 | | n | by A -> A | A -> 1 |2 | | n

Example:

S -> I E t S | i E t S e S | a E -> b

Types of Parser
Parser
Bottom Up Parsers Top Down Parsers

Shift Reduce parser

Operator Precedence Parser

LR parser

SLR parser

CLR parser

LALR parser

Types of Top Down Parsing


Top Down Parsing With Backtracking Without Backtracking

Recursive Descent Parsing

Predictive Parser

LL(k) Parser

Top Down Parsing with Backtracking


A Top-down parser tries to create a parse tree from the root towards the leafs scanning input from left to right.

In case of Top Down Parsing with Backtracking.


Parser will try different Rules or productions to find the match for input string by backtracking at each step of derivation. If applied production does not give the input string as required, or does not matches with required string then we can undo that move.

Example
S->cAd A->ab | a Input: cad

S c A d c a

S A b d c

S A a d

Top Down Parsing


Consists of a set of procedures, one for each nonterminal Execution begins with the procedure for start symbol A typical procedure for a non-terminal

void A() { choose an A-production, A->X1X2..Xk for (i=1 to k) { if (Xi is a nonterminal call procedure Xi(); else if (Xi equals the current input symbol a) advance the input to the next symbol; else /* an error has occurred */ } }

Recursive decent Parser


It can be also viewed as finding a leftmost derivation for an input string Example: id+id*id

E -> TE E -> +TE | T -> FT T -> *FT | F -> (E) | id

lm

E T E

lm

E T E T

lm

E T E T

lm

E T E

lm

E T E

F id

F
id

F
id

T + T E

Recursive descent parsing (cont)


General recursive descent may require backtracking The previous code needs to be modified to allow backtracking In general form it cant choose an A-production easily. So we need to try all alternatives If one failed the input pointer needs to be reset and another alternative should be tried

Recursive descent parsers cant be used for left-recursive grammars

LL(1) Grammars
Predictive parsers are those recursive descent parsers needing no

backtracking Grammars for which we can create predictive parsers are called LL(1)
The first L means scanning input from left to right The second L means leftmost derivation And 1 stands for using one input symbol for lookahead

A grammar G is LL(1) if and only if whenever A-> |are two

distinct productions of G, the following conditions hold:


For no terminal a do and both derive strings beginning with a At most * one of or can derive empty string If => then does not derive any string beginning with a terminal in

Follow(A).

Model of a table-driven predictive parser

Predictive parsing algorithm


Set input pointer (ip) to the first token a; Push $ and start symbol to the stack. Set X to the top stack symbol; while (X != $) { /*stack is not empty*/ if (X is token a) pop the stack and advance ip; else if (X is another token) error(); else if (M[X,a] is an error entry) error(); else if (M[X,a] = X Y1Y2Yk) { output the production X Y1Y2Yk; pop the stack; /* pop X */ /* leftmost derivation*/ push Yk,Yk-1,, Y1 onto the stack, with Y1 on top; } set X to the top stack symbol Y1; } // end while
28

LL(1) Parsers (Cont.)


Stack $E Input id + id * id $ Output

$ET $ETF
$ETid $ET $E $ET+

id + id * id $ id + id * id $
id + id * id $ + id * id $ + id * id $ + id * id $

E TE T FT
F id match id T E +TE

LL(1) Parsers (Cont.)


Stack $ET $ETF $ETid $ET $ETF* $ETF $ETid $ET $E $ Input id * id $ id * id $ id * id $ * id $ * id $ id $ id $ $ $ $ Output match + T FT F id match id T *FT match * F id match id T E

First and Follow


First() is set of terminals that begins strings derived from * then is also in First() If => In predictive parsing when we have A-> |, if First() and

First() are disjoint sets then we can select appropriate Aproduction by looking at the next input. Follow(A), for any nonterminal A, is set of terminals a that can appear immediately after A in some sentential form
* If we have S => Aa for some and then a is in Follow(A)

If A can be the rightmost symbol in some sentential form,

then $ is in Follow(A)

Computing First
To compute First(X) for all grammar symbols X, apply

following rules until no more terminals or can be added to any First set:
1. 2.

3.

If X is a terminal then First(X) = {X}. *>Y1Y2Yk is a production for some If X is a nonterminal and Xk>=1, then place a in First(X) if for some i a is in First(Yi) and is in all of First(Y1),,First(Yi-1) that is Y1Yi-1 => . if is in First(Yj) for j=1,,k then add to First(X). If X-* > is a production then add to First(X)

Example!

Computing follow
To compute First(A) for all nonterminals A, apply following rules until

nothing can be added to any follow set: 1. Place $ in Follow(S) where S is the start symbol 2. If there is a production A-> B then everything in First() except is in Follow(B). 3. If there is a production A->B or a production A->B where First() contains , then everything in Follow(A) is in Follow(B)
Example!

Examples
S-> AaA|BbB A->bB B-> Computing the first 1. A->bB First (A)={b} 2. S->AaA First(S)=First(A aA)={b} 3.S->BbB First(S)=First(B bB) =First(B) {} U First(bB) =First(B)-{} U {b} = {}-{} U {b}= {b} =>{b}

Computing Follow Follow (S) = {$} 1. S->AaA Follow (A)= {First(a,A)} = {a} S-> AaA Follow (A) = {Follow(S)} 2. S-> BbB Follow (B)= {b} S->BbB Follow (B)={Follow(S)} 3. A->bB Follow (B)= {Follow(A)}

Computing Follow Follow (S) = {$} Follow (A) = {a} Follow (A) = {Follow(S)} Follow (B)= {b} Follow (B)={Follow(S)} Follow (B)= {Follow(A)} Follow (S)={$} Follow (A)={$,a} Follow (B)={$,a,b}

Construction of predictive parsing table


For each production A-> in grammar do the following:

For each terminal a in First() add A-> in M[A,a] If is in First(), then for each terminal b in Follow(A) add A-> to M[A,b]. If is in First() and $ is in Follow(A), add A-> to M[A,$] as well

If after performing the above, there is no production in M[A,a] then set

M[A,a] to error

Building a parser
Original grammar:

EE+E EE*E E(E) Eid

This grammar is left-recursive, ambiguous and requires left-factoring. It needs to be modified before we build a predictive parser for it:
Remove left recursion: Remove ambiguity:

EE+T TT*F F(E) Fid

ETE' E'+TE'| TFT' T'*FT'| F(E) Fid

Building a parser The grammar:

ETE' E'+TE'| TFT' T'*FT'| F(E) Fid

FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E') = {+, } FIRST(T') = {*, } FOLLOW(E) = FOLLOW(E') = {$, )} FOLLOW(T) = FOLLOW(T') = {+, $, )} FOLLOW(F) = {*, +, $, )} Now, we can either build a table or design a recursive descend parser.

Parsing table
E E' E'+TE' T T' T' F match + * ( ) id $ + * ( ETE' TFT' F(E) match ) E' id ETE' TFT' Fid $ E'

T'*FT'
match

T'

T'

match

match

accept

Parsing table
Step Stack 1 $E 2 $E'T 3 $E'T'F 4 $E'T'id 5 $E'T' 6 $T'F* 7 $T'F 8 $T'id 9 $T' 10 $

Parse the input id*id using the parse table and a stack Input Next Action id*id$ ETE' id*id$ TFT' id*id$ Fid id*id$ match id *id$ T'*FT' *id$ match * id$ Fid id$ match id $ T' $ accept

Recursive descend parser


parse() { token = get_next_token(); if (E() and token == '$') then return true else return false }

E() { if (T()) then return Eprime() else return false }

Eprime() { if (token == '+') then token=get_next_token() if (T()) then return Eprime() else return false else if (token==')' or token=='$') then return true else return false }

The remaining procedures are similar.

Example
E -> TE E -> +TE | T -> FT T -> *FT | F -> (E) | id
Non terminal

First

Follow
{+, *, ), $} {+, ), $} {), $} {), $} {+, ), $}

F T E E T
id +
E -> +TE

{(,id} {(,id} {(,id} {+,} {*,}

Input Symbol ( *
E -> TE

)
E ->

$
E ->

E E T T F

E -> TE

T -> FT T -> F -> id

T -> FT T -> *FT F -> (E) T -> T ->

Another example
S -> iEtSS | a S -> eS | E -> b
Non terminal

a
S -> a

Input Symbol i e
S -> iEtSS S -> S -> eS

$
S ->

S S E

E -> b

Non-recursive predicting parsing


a + b $

stack

X Y Z $

Predictive parsing program

output

Parsing Table M

Error recovery in predictive parsing


Panic mode Place all symbols in Follow(A) into synchronization set for nonterminal A: skip tokens until an element of Follow(A) is seen and pop A from stack. Add to the synchronization set of lower level construct the symbols that begin higher level constructs Add symbols in First(A) to the synchronization set of nonterminal A If a nonterminal can generate the empty string then the production deriving can be used as a default If a terminal on top of the stack cannot be matched, pop the terminal, issue a message saying that the terminal was insterted

Example

Non terminal E E T T F

id E -> TE

Input Symbol ( ) * E -> TE synch

$ synch

E -> +TE T -> FT


synch

E -> E ->

T -> FT synch
T ->

synch T -> synch

T -> T -> *FT F -> id synch synch

F -> (E) synch

Stack
E$ E$ TE$ FTE$ idTE$ TE$ *FTE$ FTE$ TE$

Input
)id*+id$ id*+id$ id*+id$ id*+id$ id*+id$ *+id$ *+id$ +id$ +id$

Action
Error, Skip ) id is in First(E)

Error, M[F,+]=synch F has been poped

Introduction

Constructs parse tree for an input string beginning at the leaves (the bottom) and working towards the root (the top) Example: id*id

E -> E + T | T T -> T * F | F F -> (E) | id

id*id

F * id
id

T * id
F id

T*F
F id id

F T*F F id id

E F T*F F id id

Shift-reduce parser

The general idea is to shift some symbols of input to the stack until a reduction can be applied At each reduction step, a specific substring matching the body of a production is replaced by the nonterminal at the head of the production The key decisions during bottom-up parsing are about when to reduce and about what production to apply A reduction is a reverse of a step in a derivation The goal of a bottom-up parser is to construct a derivation in reverse:

E=>T=>T*F=>T*id=>F*id=>id*id

Handle pruning

A Handle is a substring that matches the body of a production and whose reduction represents one step along the reverse of a rightmost derivation

Right sentential form

Handle id F id T*F

Reducing production

id*id F*id T*id T*F

F->id T->F F->id


E->T*F

Shift reduce parsing


A stack is used to hold grammar symbols Handle always appear on top of the stack Initial configuration:

Stack $

Input w$ Input $

Acceptance configuration

Stack $S

Shift reduce parsing (cont.)

Basic operations:

Shift Reduce Accept Error

Example: id*id

Stack $ $id $F $T $T* $T*id $T*F $T $E

Input id*id$ *id$ *id$ *id$ id$ $ $ $ $

Action shift reduce by F->id reduce by T->F shift shift reduce by F->id reduce by T->T*F reduce by E->T accept

Handle will appear on top of the stack


S A B Stack $ $B $By y Input yz$ yz$ z$ z S B Stack x A y z

Input xyz$ z$

$ $Bxy

Conflicts during shit reduce parsing

Two kind of conflicts

Shift/reduce conflict Reduce/reduce conflict

Example:

Stack if expr then stmt

Input
else $

Reduce/reduce conflict
stmt -> id(parameter_list) stmt -> expr:=expr parameter_list->parameter_list, parameter parameter_list->parameter parameter->id expr->id(expr_list) expr->id expr_list->expr_list, expr Stack expr_list->expr id(id

Input

,id) $

LR Parsing
The most prevalent type of bottom-up parsers
LR(k), mostly interested on parsers with k<=1 Why LR parsers? Table driven Can be constructed to recognize all programming language constructs Most general non-backtracking shift-reduce parsing method Can detect a syntactic error as soon as it is possible to do so Class of grammars for which we can construct LR parsers are superset of those which we can construct LL parsers

States of an LR parser

States represent set of items An LR(0) item of G is a production of G with the dot at some position of the body:

For A->XYZ we have following items

A->.XYZ A->X.YZ A->XY.Z A->XYZ. In a state having A->.XYZ we hope to see a string derivable from XYZ next on the input. What about A->X.YZ?

Constructing canonical LR(0) item sets


Augmented grammar:
G with addition of a production: S->S

Closure of item sets:


If I is a set of items, closure(I) is a set of items constructed from I by

the following rules: Add every item in I to closure(I) If A->.B is in closure(I) and B-> is a production then add the item B->. to clsoure(I).
Example:

E->E E -> E + T | T T -> T * F | F F -> (E) | id

I0=closure({[E->.E]} E->.E E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id

Constructing canonical LR(0) item sets (cont.)

Goto (I,X) where I is an item set and X is a grammar symbol is closure of set of all items [A->

X. ] where [A-> .X ] is in I
I1

Example

I0=closure({[E->.E]} E->.E E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id

E T (

E->E. E->E.+T
I2 E->T. T->T.*F
I4 F->(.E) E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id

Closure algorithm
SetOfItems CLOSURE(I) { J=I; repeat for (each item A-> .B in J) for (each prodcution B-> of G) if (B->. is not in J) add B->. to J; until no more items are added to J on one round; return J;

GOTO algorithm
SetOfItems GOTO(I,X) { J=empty; if (A-> .X is in I) add CLOSURE(A-> X. ) to J; return J;

Canonical LR(0) items


Void items(G) { C= CLOSURE({[S->.S]}); repeat for (each set of items I in C) for (each grammar symbol X) if (GOTO(I,X) is not empty and not in C) add GOTO(I,X) to C; until no new set of items are added to C on a round; }

Example
I0=closure({[E->.E]} E->.E E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id

acc $ E T id (
I5=goto(I0,id) F->id.
I4 F->(.E) E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id

E->E E -> E + T | T T -> T * F | F F -> (E) | id


I1=goto(I0,E) E->E. E->E.+T I2=goto(I0,T) E->T. T->T.*F

+ * id

I6 E->E+.T T->.T*F T->.F F->.(E) F->.id I7 T->T*.F F->.(E) F->.id

I9

E->E+T. T->T.*F I10 T->T*F.

+ E
I8 E->E.+T F->(E.)

I11 F->(E).

I3 T>F.

Use of LR(0) automaton

Example: id*id

Line Stack

Symbols

Input

Action

(1)
(2) (3) (4) (5)

0
05 03 02 027

$
$id $F $T $T*

id*id$
*id$ *id$ *id$ id$

Shift to 5
Reduce by F->id Reduce by T->F Shift to 7 Shift to 5

(6)
(7) (8) (9)

0275
02710 02 01

$T*id
$T*F $T $E

$
$ $ $

Reduce by F->id
Reduce by T->T*F Reduce by E->T accept

LR-Parsing model
INPUT
a1 ai an $

Sm Sm-1 $

LR Parsing Program

Output

ACTION

GOTO

LR parsing algorithm
let a be the first symbol of w$; while(1) { /*repeat forever */ let s be the state on top of the stack; if (ACTION[s,a] = shift t) { push t onto the stack; let a be the next input symbol; } else if (ACTION[s,a] = reduce A->) { pop || symbols of the stack; let state t now be on top of the stack; push GOTO[t,A] onto the stack; output the production A->; } else if (ACTION[s,a]=accept) break; /* parsing is done */ else call error-recovery routine; }

Constructing SLR parsing table


Method Construct C={I0,I1, , In}, the collection of LR(0) items for G State i is constructed from state Ii:

If [A->.a] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to shift j If [A->.] is in Ii, then set ACTION[i,a] to reduce A-> for all a in follow(A) If {S->.S] is in Ii, then set ACTION[I,$] to Accept

If any conflicts appears then we say that the grammar is not

SLR(1). If GOTO(Ii,A) = Ij then GOTO[i,A]=j All entries not defined by above rules are made error The initial state of the parser is the one constructed from the set of items containing [S->.S]

Example
STATE id 0 1 2 S5 S6 R2 S7 R2 + * ACTON ( S4 Acc R2 ) $ E 1 GOTO T 2 F 3

3
4 5 6 7 8 9 10 11 S5 S5 S5

R 4

R7
S4

R4

R4
8 2 3

R 6

R 6 S4 S4

R6

R6 9 3 10

S6 R1 R3 R5 S7 R3 R5

S11 R1 R3 R5 R1 R3 R5

(0) E->E (1) E -> E + T (2) E-> T (3) T -> T * F (4) T-> F (5) F -> (E) (6) F->id

Line (1) (2) (3)

Stac k 0 05 03

Symbol s

Input id*id+id$

Action Shift to 5 Reduce by F->id Reduce by T->F

id F

*id+id$ *id+id$

id*id+id?

(4)
(5) (6) (7) (8) (9) (10) (11) (12) (13) (14)

02
027 0275 02710 02 01 016 0165 0163 0169 01

T
T* T*id T*F T E E+ E+id E+F E+T` E

*id+id$
id+id$ +id$ +id$ +id$ +id$ id$ $ $ $ $

Shift to 7
Shift to 5 Reduce by F->id Reduce by T>T*F Reduce by E->T Shift Shift Reduce by F->id Reduce by T->F Reduce by E>E+T accept

Example grammar which is not S -> L=R | R SLR(1)


L -> *R | id R -> L
I0 S->.S S -> .L=R S->.R L -> .*R | L->.id R ->. L I1 S->S. I2 S ->L.=R R ->L.

I3 S ->R.
I4 L->*.R R->.L L->.*R L->.id Action =

I5 L -> id.
I6 S->L=.R R->.L L->.*R L->.id

I7 L -> *R.
I8 R -> L. I9 S -> L=R.

Shift 6 Reduce R->L

More powerful LR parsers

Canonical-LR or just LR method

Use lookahead symbols for items: LR(1) items Results in a large collection of items

LALR: lookaheads are introduced in LR(0) items

Canonical LR(1) items


In LR(1) items each item is in the form: [A->.,a] An LR(1) item [A->.,a] is valid for a viable prefix if there is a derivation S=>Aw=>w, where

=
Example:

*
rm

Either a is the first symbol of w, or w is and a is $ S->BB B->aB|b

* S=>aaBab=>aaaBab rm Item [B->a.B,a] is valid for =aaa and w=ab

Constructing LR(1) sets of items


SetOfItems Closure(I) { repeat for (each item [A->.B,a] in I) for (each production B-> in G) for (each terminal b in First(a)) add [B->., b] to set I; until no more items are added to I; return I; } SetOfItems Goto(I,X) { initialize J to be the empty set; for (each item [A->.X,a] in I) add item [A->X.,a] to set J; return closure(J); } void items(G){ initialize C to Closure({[S->.S,$]}); repeat for (each set of items I in C) for (each grammar symbol X) if (Goto(I,X) is not empty and not in C) add Goto(I,X) to C; until no new sets of items are added to C;

Example
S->S S->CC C->cC C->d

Canonical LR(1) parsing table


Method Construct C={I0,I1, , In}, the collection of LR(1) items for G State i is constructed from state Ii:

If [A->.a, b] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to shift j If [A->., a] is in Ii, then set ACTION[i,a] to reduce A-> If {S->.S,$] is in Ii, then set ACTION[I,$] to Accept

If any conflicts appears then we say that the grammar is not

LR(1). If GOTO(Ii,A) = Ij then GOTO[i,A]=j All entries not defined by above rules are made error The initial state of the parser is the one constructed from the set of items containing [S->.S,$]

Example
S->S S->CC C->cC C->d

LALR Parsing Table

For the previous example we had:

I4 C->d. , c/d

I47 C->d. , c/d/$

I7 C->d. , $

State merges cant produce Shift-Reduce conflicts.

Why? But it may produce reduce-reduce conflict

Example of RR conflict in state merging


S->S S -> aAd | bBd | aBe | bAe A -> c B -> c

An easy but space-consuming LALR table construction


Method: 1. Construct C={I0,I1,,In} the collection of LR(1) items. 2. For each core among the set of LR(1) items, find all sets having that core, and replace these sets by their union. 3. Let C={J0,J1,,Jm} be the resulting sets. The parsing actions for state i, is constructed from Ji as before. If there is a conflict grammar is not LALR(1). 4. If J is the union of one or more sets of LR(1) items, that is J = I1 UI2IIk then the cores of Goto(I1,X), , Goto(Ik,X) are the same and is a state like K, then we set Goto(J,X) =k. This method is not efficient, a more efficient one

is discussed in the book

Compaction of LR parsing table

Many rows of action tables are identical

Store those rows separately and have pointers to them from

different states Make lists of (terminal-symbol, action) for each state Implement Goto table by having a link list for each nonterinal in the form (current state, next state)

Using ambiguous grammars


STATE ACTON id 0 1 2 3 S3 R4 S3 S4 S5 S2 R4 R4 R4 + * ( S2 Acc 6 ) $

E->E+E E->E*E E->(E) E->id

GO TO E 1

4
5

S3
S3 S4 R1 R2 R3 S5 S5 R2 R3

S2
S2

7
8

I0: E->.E E->.E+E E->.E*E E->.(E) E->.id I3: E->.id

I1: E->E. E->E.+E E->E.*E

I2: E->(.E) E->.E+E E->.E*E E->.(E) E->.id I5: E->E*.E E->(.E) E->.E+E E->.E*E E->.(E) E->.id

6 7 8 9

R1 R2 R3

R1 R2 R3

I4: E->E+.E E->.E+E E->.E*E E->.(E) E->.id

I6: E->(E.) E->E.+E E->E.*E I8: E->E*E. E->E.+E E->E.*E

I7: E->E+E. E->E.+E E->E.*E I9: E->(E).

Readings

Chapter 4 of the book

You might also like