0% found this document useful (0 votes)
361 views56 pages

Intermediate Code Generation 1

The document discusses intermediate code generation during compilation. It describes translating source code into an intermediate language that is machine-independent yet similar to machine code. This facilitates code optimization and retargeting to different machines. Specific intermediate languages discussed include syntax trees, postfix notation, and three-address code using quadruples or triples to represent statements in a linear form. It also covers generating three-address code through syntax-directed translation and implementing declarations.

Uploaded by

shashwat2010
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
361 views56 pages

Intermediate Code Generation 1

The document discusses intermediate code generation during compilation. It describes translating source code into an intermediate language that is machine-independent yet similar to machine code. This facilitates code optimization and retargeting to different machines. Specific intermediate languages discussed include syntax trees, postfix notation, and three-address code using quadruples or triples to represent statements in a linear form. It also covers generating three-address code through syntax-directed translation and implementing declarations.

Uploaded by

shashwat2010
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Intermediate code generation

Shashwat Shriparv
[email protected]
InfinitySoft
Intermediate Code Generation

 Translating source program into an


“intermediate language.”
 Simple
 CPU Independent,
 …yet, close in spirit to machine language.

 Benefits is
1. Retargeting is facilitated
2. Machine independent Code Optimization can be applied.
Intermediate Code Generation
 Intermediate codes are machine independent codes, but they are close to
machine instructions.
 The given program in a source language is converted to an equivalent program
in an intermediate language by the intermediate code generator.
 Intermediate language can be many different languages, and the designer of the
compiler decides this intermediate language.
 syntax trees can be used as an intermediate language.
 postfix notation can be used as an intermediate language.
 three-address code (Quadruples) can be used as an intermediate
language
 we will use quadruples to discuss intermediate code generation
 quadruples are close to machine instructions, but they are not actual
machine instructions.
Types of Intermediate Languages
 Graphical Representations.
 Consider the assignment a:=b*-c+b*-c:

assign assign

a + +
a

*
* *

b uminus b uminus uminus

c c
b c
Syntax Dir. Definition to produce syntax trees for
Assignment Statements.
PRODUCTION Semantic Rule
S  id := E { S.nptr = mknode (‘assign’,
mkleaf(id, id.entry), E.nptr) }
E  E1 + E2 {E.nptr = mknode(‘+’, E1.nptr,E2.nptr) }
E  E1 * E2 {E.nptr = mknode(‘*’, E1.nptr,E2.nptr) }
E  - E1 {E.nptr = mknode(‘uminus’, E1.nptr) }
E  ( E1 ) {E.nptr = E1.nptr }
E  id {E.nptr = mkleaf(id, id.entry) }
Three Address Code
x,y,z- names,constants or
 Statements of general form x:=y op z compiler-generated temporaries

 No built-up arithmetic expressions are allowed.


 As a result, x:=y + z * w
should be represented as
t1:=z * w
t2:=y + t1 t1 , t2 – compiler generated temporary names
x:=t2
 Observe that given the syntax-tree or the dag of the graphical
representation we can easily derive a three address code for assignments
as above.
 In fact three-address code is a linearization of the syntax tree.
 Three-address code is useful: related to machine-language/ simple/
optimizable.
3 address code for the syntax tree and the dag
a:=b*-c+b*-c:

Syntax tree Dag

assign assign

a + +
a

*
* *

b uminus b uminus uminus

c c
b c
3-address codes are

Syntax tree Dag

t1:=- c t1:=- c
t2:=b * t1 t2:=b * t1
t3:=- c
t4:=b * t3 t5:=t2 + t2
t5:=t2 + t4 a:=t5
a:=t5
Types of Three-Address Statements.

Assignment Statement: x:=y op z


Assignment Statement: x:=op z
Copy Statement: x:=z
Unconditional Jump: goto L
Conditional Jump: if x relop y goto L
Stack Operations: Push/pop

More Advanced
Procedure: Index Assignments:
param x1 x:=y[ i ]
param x2 Generated as part of x[ i ]:=y
… call of proc. p(x1,x2,
param xn ……,xn)
call p,n Address and Pointer
Assignments:
x:=&y
x:=*y
*x:=y
Syntax-Directed Translation into 3-address
code.
Syntax-Directed Translation for 3-address code for
assignment statements
 Use attributes
 E.place to hold the name of the “place” that will hold the value of
E
 Identifier will be assumed to already have the place attribute
defined.
 For example, the place attribute will be of the form t0, t1, t2, …
for identifiers and v0,v1,v2 etc. for the rest.
 E.code to hold the three address code statements that evaluate
E (this is the `translation’ attribute).
 Use function newtemp that returns a new temporary
variable that we can use.
 Use function gen to generate a single three address
statement given the necessary information (variable
names and operations).
Syntax-Dir. Definition for 3-address code
‘||’: string concatenation
PRODUCTIONSemantic Rule
S  id := E { S.code = E.code||gen(id.place ‘=’ E.place ) }
E  E1 + E2 {E.place = newtemp ;
E.code = E1.code || E2.code ||
|| gen(E.place‘:=’E1.place‘+’E2.place) }
E  E1 * E2 {E.place = newtemp ;
E.code = E1.code || E2.code ||
|| gen(E.place‘=’E1.place‘*’E2.place) }
E  - E1 {E.place = newtemp ;
E.code = E1.code ||
|| gen(E.place ‘=’ ‘uminus’ E1.place) }
E  ( E1 ) {E.place = E1.place ; E.code = E1.code}
E  id {E.place = id.entry ; E.code = ‘’ }

e.g. a := b * - (c+d)
while statements
 E.g. while statements of the form “while E do S”
(interpreted as while the value of E is not 0 do S)
PRODUCTION
S  while E do S1
To mark the 1st stmt. In
S.begin:
code for E E.code
Semantic Rule
stmt. following code S If E.place = 0 goto S.after
S.begin = newlabel;
S.after = newlabel ; S1.code
S.code = gen(S.begin ‘:’)
Goto S.begin
|| E.code S.after:
……………….
|| gen(‘if’ E.place ‘=’ ‘0’ ‘goto’ S.after)
|| S1.code
|| gen(‘goto’ S.begin)
|| gen(S.after ‘:’)
Implementation of 3 address code
Quadruples
Triples
Indirect triples
Quadruples

 A quadruple is a record structure with four fields: op,


arg1, arg2, and result
 The op field contains an internal code for an operator
 Statements with unary operators do not use arg2
 Operators like param use neither arg2 nor result
 The target label for conditional and unconditional jumps are in
result
 The contents of fields arg1, arg2, and result are
typically pointers to symbol table entries
Implementations of 3-address statements
a:=b*-c+b*-c:

 Quadruples op arg1 arg2 result


t1:=- c (0) uminus c t1
t2:=b * t1 (1) * b t1 t2
t3:=- c (2) uminus c
t4:=b * t3 (3) * b t3 t4
t5:=t2 + t4 (4) + t2 t4 t5
a:=t5 (5) := t5 a
Triples

 Triples refer to a temporary value by the position of


the statement that computes it
 Statements can be represented by a record with only three
fields: op, arg1, and arg2
 Avoids the need to enter temporary names into the symbol table
 Contents of arg1 and arg2:
 Pointer into symbol table (for programmer defined names)
 Pointer into triple structure (for temporaries)
Implementations of 3-address statements, II
a:=b*-c+b*-c:

 Triples op arg1 arg2


t1:=- c (0) uminus c
t2:=b * t1 (1) * b (0)
t3:=- c (2) uminus c
t4:=b * t3 (3) * b (2)
t5:=t2 + t4 (4) + (1) (3)
a:=t5 (5) assign a (4)
Implementations of 3-address statements, III
a:=b*-c+b*-c:
 Indirect Triples
stmt stmt op arg1 arg2
t1:=- c
t2:=b * t1 (0) (14) (14) uminus c
t3:=- c
t4:=b * t3 (1) (15) (15) * b (14)
t5:=t2 + t4
a:=t5 (2) (16) (16) uminus c
(3) (17) (17) * b (16)
(4) (18) (18) + (15) (17)
(5) (19) (19) assign a (18)
DECLARATIONS

 Declarations in a procedure
 Langs. like C , Pascal allows declarations in single procedure to
be processed as a group
 A global variable offset keeps track of the next available relative
addresses
 Before the Ist declaration is considered, the value of offset is set
to 0.
 When a new name is seen , name is entered in symbol table
with current value as offset , offset incre. by width of data object
denoted by name.
 Procedure enter(name,type,offset) creates symbol table entry
for name , gives it type type ,and rel.addr. offset in its data area
 Type , width – denotes no. of memory units taken by objects of
that type
SDT to generate ICode for Declarations
Using a global variable offset
PRODUCTION Semantic Rule

PD {}
DD;D
D  id : T { enter (id.name, T.type, offset);
offset:=offset + T.width }
T  integer {T.type = integer ; T.width = 4; }
T  real {T.type = real ; T.width = 8}
T  array [ num ] of T1
{T.type=array(1..num.val,T1.type)
T.width = num.val * T1.width}
T  ^T1 {T.type = pointer(T1.type);T1.width = 4}
Nested Procedure Declarations
 For each procedure we should create a symbol table.
mktable(previous) – create a new symbol table where previous is
the parent symbol table of this new symbol table
enter(symtable,name,type,offset) – create a new entry for a variable
in the given symbol table.
enterproc(symtable,name,newsymbtable) – create a new entry for the
procedure in the symbol table of its parent.
addwidth(symtable,width) – puts the total width of all entries in the
symbol table into the header of that table.

 We will have two stacks:


 tblptr – to hold the pointers to the symbol tables
 offset – to hold the current offsets in the symbol tables in tblptr
stack.
SDT to generate ICode for Nested Procedures
( PMD { addwidth(top(tblptr), top(offset)); pop(tblptr);
pop(offset) }
M { t:=mktable(null); push(t, tblptr); push(0, offset)}
D  D1 ; D 2 ...

D  proc id ; N D ; S { t:=top(tblpr); addwidth(t,top(offset));


pop(tblptr); pop(offset);
enterproc(top(tblptr), id.name, t)}

N   {t:=mktable(top(tblptr)); push(t,tblptr); push(0,offset);}

D  id : T {enter(top(tblptr), id.name, T.type, top(offset);


top(offset):=top(offset) + T.width

Example: proc func1; D; proc func2 D; S; S


SDT to generate ICode for assignment statements
 Use attributes
 E.place to hold the name of the “place” that will hold the value of
E
 Identifier will be assumed to already have the place attribute
defined.
 For example, the place attribute will be of the form t0, t1, t2, …
for identifiers and v0,v1,v2 etc. for the rest.
 E.code to hold the three address code statements that evaluate
E (this is the `translation’ attribute).
 Use function newtemp that returns a new temporary
variable that we can use.
 Use function gen to generate a single three address
statement given the necessary information (variable
names and operations).
Syntax-Dir. Definition for 3-address code
PRODUCTIONSemantic Rule
S  id := E { S.code = E.code||gen(id.place ‘=’ E.place ) }
E  E1 + E2 {E.place = newtemp ;
E.code = E1.code || E2.code ||
|| gen(E.place‘:=’E1.place‘+’E2.place) }
E  E1 * E2 {E.place = newtemp ;
E.code = E1.code || E2.code ||
|| gen(E.place‘=’E1.place‘*’E2.place) }
E  - E1 {E.place = newtemp ;
E.code = E1.code ||
|| gen(E.place ‘=’ ‘uminus’ E1.place) }
E  ( E1 ) {E.place = E1.place ; E.code = E1.code}
E  id {E.place = id.entry ; E.code = ‘’ }

e.g. a := b * - (c+d)
Boolean Expressions

 Boolean expressions has 2 purpose


 To compute Boolean values
 as a conditional expression for statements
 Methods of translating boolean expression:
(2 methods to represent the value of Boolean expn)
 Numerical methods:
 True is represented as 1 and false is represented as 0
 Nonzero values are considered true and zero values are considered
false
 By Flow-of-control :
 Represent the value of a boolean by the position reached in a
program
 Often not necessary to evaluate entire expression
SDT for Numerical Representation for booleans

 Expressions evaluated left to right using 1 to denote


true and 0 to donate false
 Example: a or b and not c
t1 := not c
t2 := b and t1
t3 := a or t2
 Another example: a < b
100: if a < b goto 103
101: t : = 0
102: goto 104
103: t : = 1
104: …
Emit & nextstat

 emit fn.– places 3-address stmts into an o/p


file in the right format
 nextstat fn.– gives the index of the next 3 -
address stmt in o/p sequence
 E.place to hold the name of the “place” that
will hold the value of E
SDT for Numerical Representation for booleans

Production Semantic Rules


E.place := newtemp;
E  E1 or E2 emit(E.place ':=' E1.place 'or'
E2.place)
E.place := newtemp;
E  E1 and E2 emit(E.place ':=' E1.place 'and'
E2.place)
E.place := newtemp;
E  not E1
emit(E.place ':=' 'not' E1.place)
E  (E1) E.place := E1.place;
SDT for Numerical Representation for booleans

Production Semantic Rules


E.place := newtemp;
emit('if' id1.place relop.op
id2.place 'goto' nextstat+3);
E  id1 relop id2
emit(E.place ':=' '0');
emit('goto' nextstat+2);
emit(E.place ':=' '1');
E.place := newtemp;
E  true
emit(E.place ':=' '1')
E.place := newtemp;
E  false
emit(E.place ':=' '0')

nextstat fn.– gives the index of the next 3 - address stmt


in o/p sequence
Example: a<b or c<d and e<f

100: if a < b goto 103


101: t1 := 0
102: goto 104
103: t1 := 1
104: if c < d goto 107
105: t2 := 0
106: goto 108
107: t2 := 1
108: if e < f goto 111
109: t3 := 0
110: goto 112
111: t3 := 1
112: t4 := t2 and t3
113: t5 := t1 or t4
Flow of control Stmts
 S →if E then S1 |
if E then S1 else S2|while E do S1

 Here E is the boolean expn. to be translated


 We assume that 3-address code can be labeled
 newlabel returns a symbolic label each time its called.
 E is associated with 2 labels
1. E.true – label which controls flow if E is true
2. E.false – label which controls flow if E is false
 S.next – is a label that is attached to the first 3 address
instruction to be executed after the code for S
1. Code for if - then
Semantic rules
S →If E then S1
E.true := newlabel;

to E.true
E.false := S.next;
E.code
to E.false
E.true: S1.next := S.next;
S1.code
E.false:
……….. S.code := E.code ||
gen(E.true ':') ||
S1.code
2.Code for if-then-else

Semantic rules
S  if E then S1 else S2
E.true := newlabel;
E.false := newlabel;
to E.true S1.next := S.next;
E.code
to E.false S2.next := S.next;
E.true:
S1.code S.code := E.code ||
goto S.next gen(E.true ':') ||
E.false: S1.code ||
S2.code gen(‘ goto‘ S.next) ||
S.next ………..
gen ( E.false ‘:’ ) ||
S2.code
3. Code for while-do
Semantic rules
S.begin := newlabel;
S  while E do S1
E.true := newlabel;
E.false := S.next;
S.begin to E.true S1.next := S.begin;
E.code
to E.false
S.code := gen(S.begin ':') ||
E.true:
S1.code E.code ||
goto S.begin gen(E.true ':') ||
E.false:
………..
S1.code ||
gen('goto' S.begin)
Jumping code/Short Circuit code for boolean
expression
 Boolean Expressions are translated in a sequence of
conditional and unconditional jumps to either E.true or
E.false.
 a < b. The code is of the form:
if a < b then goto E.true
goto E.false
 E1 or E2. If E1 is true then E is true, so E1.true = E.true.
Otherwise, E2 must be evaluated, so E1.false is set to
the label of the first statement in the code for E2.
 E1 and E2. Analogous considerations apply.
 not E1. We just interchange the true and false with that
for E.
Control flow translation of boolean expression
We will now see the code produced for the boolean expression E

Production Semantic Rules


E1.true := E.true;
E1.false := newlabel;
E2.true := E.true;
E  E1 or E2
E2.false := E.false;
E.code := E1.code ||
gen(E1.false ':') || E2.code
E1.true := newlabel;
E1.false := E.false;
E2.true := E.true;
E  E1 and E2
E2.false := E.false;
E.code := E1.code ||
gen(E1.true ':') || E2.code
Production Semantic Rules
E  not E1 E1.true := E.false;
E1.false := E.true;
E.code := E1.code
E  (E1) E1.true := E.true;
E1.false := E.false;
E.code := E1.code
E  id1 relop id2 E.code := gen('if' id.place
relop.op id2.place 'goto'
E.true) ||
gen('goto' E.false)
E  true E.code := gen('goto' E.true)
E  false E.code := gen('goto' E.false)
Example

while a < b do
if c < d then
x := y + z
else
x := y - z
Example

while a < b do
if c < d then
Lbegin: if a < b goto L1
x := y + z
goto Lnext
else L1: if c < d goto L2
x := y - z goto L3
L2: t1 := y + z
x := t1
goto Lbegin
L3: t2 := y - z
x := t2
goto Lbegin
Lnext:
Case Statements
 Switch <expression>
begin
case value : statement
case value : statement
……..
case value : statement
default : statement
end
Translation of a case stmt

code to evaluate E into t test: if t = V1 goto L1


goto test …
L1: code for S1 if t = Vn-1 goto Ln-1
goto next goto Ln
… next:
Ln-1: code for Sn-1
goto next
Ln: code for Sn
goto next
Backpatching
 Easiest way to implement Syntax directed defn. is to
use 2 passes
 First, construct syntax tree
 Walk through syntax tree in depth-first order,
computing translations
 May not know the labels to which control must flow
at the time a jump is generated
 Affect boolean expressions and flow control statements
 Leave targets of jumps temporarily unspecified
 Add each such statement to a list of goto statements whose
labels will be filled in later
 This filling in of labels is called back patching

How backpatching is implemented in 1 pass….?


Lists of Labels
 Imagine that we are generating quadruples into a
quadruple array.
 Labels are indices into this array
 To manipulate this list of labels we use 3 fns.
 makelist(i)
 Creates a new list containing only i, and index into the array of
quadruples
 Returns a pointer to the new list
 merge(p1, p2)
 Concatenates two lists of labels
 Returns a pointer to the new list
 backpatch(p, i) – inserts i as target label for each
statement on the list pointed to by p
Boolean Expressions and Markers

E  E1 or M E2
| E1 and M E2
| not E1
| (E1)
| id1 relop id2
| true
| false

M  ε
The New Marker , M

 Translation scheme suitable for producing


quadruples during bottom-up pass
 The new marker has an associated semantic action which
Picks up, at appropriate times, the index of the next quadruple
to be generated
 M.quad := nextquad
 Nonterminal E will have two new synthesized
attributes:
 E.truelist contains a list of statements that jump when
expression is true
 E.falselist contains a list of statements that jump when
expression is false
Example: E  E1 and M E2

 If E1 is false:
 Then E is also false
 So statements on E .falselist become part of
1
E.falselist
 If E1 is true:
 Still
need to test E2
 Target for statements on E .truelist must be the
1
beginning of code generated for E2
 Target is obtained using the marker M
New Syntax-Directed Definition (1)

Production Semantic Rules


backpatch(E1.falselist, M.quad);
E.truelist := merge(E1.truelist,
E  E1 or M E2
E2.truelist);
E.falselist := E2.falstlist
backpatch(E1.truelist, M.quad);
E.truelist := E2.truelist;
E  E1 and M E2
E.falselist := merge(E1.falselist,
E2.falselist)
E.truelist := E1.falselist;
E  not E1
E.falselist := E1.truelist
E.truelist := E1.truelist;
E  (E1)
E.falselist := E1.falselist
New Syntax-Directed Definition (2)

Production Semantic Rules


E.truelist := makelist(nextquad);
E.falselist := makelist(nextquad+1);
E  id1 relop id2 emit('if' id1.place relop.op
id2.place 'goto _');
emit('goto _')
E.truelist := makelist(nextquad);
E  true
emit('goto _')
E.falselist := makelist(nextquad);
E  false
emit('goto _')
M  ε M.quad := nextquad
Example Revisited (1)

 Reconsider: a<b or c<d and e<f


 First, a<b will be reduced, generating:
100: if a < b goto _
101: goto _
 Next, the marker M in E  E1 or M E2 will be
reduced, and M.quad will be set to 102
 Next, c<d will be reduced, generating:
102: if c < d goto _
103: goto _
Example Revisited (2)

 Next, the marker M in E  E1 and M E2 will be


reduced, and M.quad will be set to 104
 Next, e<f will be reduced, generating:
104: if e < f goto _
105: goto _
 Next, we reduce by E  E1 and M E2
 Semantic action calls backpatch({102}, 104)
 E1.truelist contains only 102
 Line 102 now reads: if c <d goto 104
Example Revisited (3)

 Next, we reduce by E  E1 or M E2
 Semantic action calls backpatch({101}, 102)
 E .falselist contains only 101
1
 Line 101 now reads: goto 102
 Statements generated so far:
100: if a < b goto _
101: goto 102
102: if c < d goto 104
103: goto _
104: if e < f goto _
105: goto _
 Remaining goto instructions will have their addresses filled in
later
Annotated Parse Tree
Procedure Calls
 Grammar
S -> call id ( Elist )
Elist -> Elist , E
Elist -> E

 Semantic Actions
1. S -> call id (Elist) for each item p in queue do
{ gen(‘param’ p)
gen(‘call’ id.place)
}
2. Elist -> Elist , E {append E.place to the end of queue}

3. Elist -> E { initialize queue to contain only E.place}

e.g.

P (x1,x2,x3,…………….xn)

param x1
param x2
………….
………….
param xn

call P
Shashwat Shriparv
[email protected]
InfinitySoft

You might also like