Intermediate Code Generation
Syntax Directed Translation (SDT)
• A compiler implementation method where source
language translation completely driven by parse
• By translating a string into a sequence of actions by
attaching one such action to each rule of a grammar
– Thus, parsing a string of the grammar produces a sequence
of rule applications
• SDT provides a simple way to attach semantics to any
such syntax
SDT in Action
• Parser produces an abstract-syntax tree
• Syntax-directed translation -- from sequence of tokens to some other form, based
on syntax
• A syntax-directed translation is defined by augmenting the CFG:
– a translation rule is defined for each production
– a translation rule defines translation of left-hand side nonterminal as a function of:
• constants
• the right-hand-side non-terminals' translations
• the right-hand-side tokens' values
• To translate an input string:
– Build parse tree
– Use translation rules to compute translation of each nonterminal in the tree, bottom up
• The translation of the string is the translation of the parse tree's root nonterminal.
Translation Rule in Action
E(2) Existing E(1)
.Place Newly ‘constructed’ using E(1) and E(2)
.Place
.Code .Code
Attributes of E
… … E
… … .Place
E An ‘instance’ of E .Code
Synthesized Attributes of E
Operator ‘+’ …
+
…
E An ‘instance’ of E E
…
After Reduction
… …
… … …
Intermediate Code
• An ‘architecture independent’ coding scheme
1. Postfix Notation
2. Syntax Tree
3. 3-Address Code
• statement of the form x = y op z
• x, y, z have address (memory location)
• might contain less than three references
(a – b) * (c + d) + (a – b)
➔ ab – cd + *ab -+
3-Address Code Examples
• a * – (b + c)
• for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}
Implementation of Three Address Code
a = b * – c + b * – c
• Quadruple
t1 = uminus c
– Structure with 4 fields t2 = b * t1
• op, arg1, arg2 and result t3 = uminus c
– op denotes operator t4 = b * t3
– arg1 and arg2 denotes the two operands t5 = t2 + t4
– result to store the result of the expression a = t5
– Advantage
• Easy to rearrange code for optimization
• Can quickly access temporary variables
– using symbol table
– Disadvantage
• Contain lot of temporaries
• Temporary variable creation increases time
and space complexity
Implementation of Three Address Code
a = b * – c + b * – c
• Triples
t1 = uminus c
– Does not make use of extra temporary variable to t2 = b * t1
represent a single operation t3 = uminus c
– Instead, when a reference to another triple’s value t4 = b * t3
is needed, a pointer to that triple is used t5 = t2 + t4
– So, it consist of only three fields namely op, arg1 a = t5
and arg2
• Disadvantages
– Temporaries implicit and difficult to rearrange code
– Difficult to optimize as optimization involves
moving intermediate code
• When a triple is moved, any other triple referring to
it must be updated also
Implementation of Three Address Code
a = b * – c + b * – c
• Indirect Triples t1 = uminus c
– Makes use of pointer to the listing of all t2 = b * t1
references to computations which is t3 = uminus c
made separately and stored t4 = b * t3
t5 = t2 + t4
– Similar in utility as compared to a = t5
quadruple representation but requires
less space
– Temporaries are implicit and easier to
rearrange code
Symbol Table
Lookup for .place
Name Address
a
b
Syntax
…
Directed
newtemp() creates temporary Temporaries
Intermediate and returns address
(t1)
Code
(t2)
Generator
…
Generated Code
…
emit(code)
…
t1 = a + b
More to be added
Semantic Actions with Production Rules
• S → id = E;
{emit(“[Link] = [Link]”); }
• E → E(1) + E(2)
{ t = newtemp();
[Link] = t;
emit(“[Link] = E(1).place + E(2).place”; }
• E → E(1) * E(2)
{ t = newtemp();
[Link] = t;
emit(“[Link] = E(1).place * E(2).place”); }
• E → (E(1))
– { [Link] = E(1).place; }
• E → id
{ [Link] = [Link]; }
Syntax-directed Intermediate Code Generation
S
1 emit(“x = t3”) 100: t1 = a + b
101: t2 = c + d
x = (a + b) * (c + d) 102: t3 = t1 * t2
x = E 103: x = t3
emit(“t3 = t1 * t2”) 2 104:
t3
E * E
7 3
t1 t2
( E ) ( E )
emit(“t1 = a + b”) 8 4 emit(“t2 = c + d”)
a b c d
E + E E + E
10 9 6 5
a b c d
S x = (a + b) * (c + d)
100: t1 = a + b
101: t2 = c + d
102: t3 = t1 * t2
103: x = t3
x = ((a + b) * (c + d)) * (e + f + g)
Translation of Boolean Expressions
• A boolean expression evaluates to TRUE or FALSE
• May use “true” or “false” as constants
• But numeric expressions connected by a relational operator
(<, <=, ==, …) are also used
• So, if ‘E’ is used to include boolean expressions also, we get
rules like:
– E → E && E | E || E | (E) | ~E | E <relop> E
• Here, (E) is parenthesized boolean expression
• In E → E <relop> E, the right-hand side E-s are arithmetic expressions
Translation of Boolean Expressions
• We can of course map boolean values to numeric values (say 0 for
false and 1 for true) and proceed
• But in a > b && b > c, this will mean we will evaluated b > c even if a
> b is false
• But we can “short-circuit” such computations
– Control-flow translation
• Here, with E (when known to be a boolean expression), we
associate two new “attributes”:
– [Link]
– [Link]
Some More 3-Address Expressions
1. if <addr-1> <relop> <addr-2> goto <quad-address>
2. goto <quad-address>
• Note: During translation by a semantic rule:
– The ‘target’ <quad-address> may be left ‘blank’
and must be ‘patched’ later
:
• 1234 if T1 == d goto _______
Some More Helper Functions
• nextquad()
– Returns the “address” where the next emit() will add code
• makelist(<quad-address>)
– Returns a newly created list of quad-addresses with the given parameter as
the only member of the list
• merge(<quad-address-list>, <quad-address-list>)
– Merges the two lists and returns the merged list
• backpatch(<quad-address-list>, <quad-address>)
– ‘Patch’ the blank entries in ALL the quadruples in the (<quad-address-list>
with the actual target quadruple address <quad-address>
Short-circuit Translation of Boolean Expressions
• ‘E’ represents a boolean
expression
• A block of code needed to Code Block
for
evaluate the notional value Boolean Unpatched quadruple
– A list of quadruple addresses where Expression
addresses where E is known
to be true
the expression is definitely true E [Link]
• [Link]
– A list of quadruple addresses where
the expression is definitely false Unpatched quadruple
• [Link] addresses where E is known
to be false
[Link]
E→ E1 && M E2
1 2
Code Block E .true Code Block E .true [Link]
for for
Boolean Boolean
Expression Expression
E1 E2
[Link] [Link]
Merge
[Link]
• E → E1 && M E2
{backpatch([Link], [Link]);
[Link] = merge([Link], [Link]);
[Link] = [Link];}
• E → E1 || M E2
{backpatch([Link], [Link]);
[Link] = merge([Link], [Link]);
[Link] = [Link];}
• E → ~E1
{ [Link] = [Link];
[Link] = [Link];}
• E → (E1)
{ [Link] = [Link];
[Link] = [Link];}
• E → E1 <relop> E2
{ [Link] = nextquad();
[Link] = nextquad() + 1;
emit(“if (E(1).place <relop> E(2).place) goto ___”);
emit(“goto ___”);
}
• M→ϵ
{ [Link] = nextquad();}
Example
a + b < c || e > f + g && p == q
E => E || M E
=> E || M E && M E
=> E || M E && E
=> E || M E && M E == E
=> E || M E && M E == q
=> E || M E && M p == q
=> E || M E && ϵ p == q
=> E || M E > E && p == q
=> E || M E > E + E && p == q
=> E || M E > E + g && p == q
=> E || M E > f + g && p == q
=> E || M e > f + g && p == q
=> E || ϵ e > f + g && p == q
=> E < E || e > f + g && p == q
=> E < c || e > f + g && p == q
=> E + E < c || e > f + g && p == q
=> E + b < c || e > f + g && p == q
=> a + b < c || e > f + g && p == q
Non-terminal for “Statement”
Suppose S is the non-terminal for a “statement”
• S may be simple, for example an assignment
• It may have complexity as in if, if-else, Code Block
while, etc. Unpatched quadruple
– Portion(s) of them can be another statement for a addresses where S will
• There may be compound sentences (as in side a ‘{‘ Statement branch-out to the next
statement
and ‘}’ in C S [Link]
• At several points in a statement, there may branch-
outs to the next statement
– Many of these branch-outs may be ‘unpatched’ till now
– We will have an attribute, [Link], to maintain a list of
unpatched branch-outs
If-Else Statement Example
• If (<condition- [Link] Code Block [Link]
expression>) for
Boolean Expression E
<statement-if>
Code Block
<statement-else> for [Link]
• S → if (E) S1 else Statement S1
Unconditional goto
S2 Code Block
for [Link]
Statement S2 [Link]
While Statement Example
• While (<condition- [Link] Code Block [Link]
expression>) for
<statement> Boolean Expression E
• S → while (E) S1
[Link] Code Block
• There is an implied goto for
at end of S to top of E Statement S1
[Link]
Unconditional goto
A Simple C-Like Language
• The following constructs only
– S: Statement, including assignment statement
– L: Statement list, i.e., ‘compound’ statement
– E: Arithmetic or boolean expression
• Rules (in addition to whatever given earlier)
– S → if (E) S
| if (E) S else S
| while (E) S
|{L}
– L→LS
|S
Semantic Rules
S → if (E) M S1
S → if (E) M1 S1 N else M2 S2
S → while (M1 E) M2 S1
S→{L}
L → L1 M S
L→S
Semantic Rules
Example Code to Translate
while (a + b < c || e > f + g && p == q)
{
if (e > f)
{
m = n + p;
n = p + q;
}
else e = f + g;
p = q + r * s;
}
while (a + b < c || e > f + g && p == q) While and cond
{
if (e > f)
{
m = n + p;
Stmt-1 Target of
n = p + q;
While
}
else e = f + g;
Stmt-2 p = q + r * s;
}
Condition for “while”
“If-else” Statement inside “while”
After “If-else” Statement
inside “while”