Chapter 6 Intermediate Code Generation
Outline
Variants of Syntax Trees
Three-address code Types and declarations Translation of expressions Type checking Control flow Backpatching
Introduction
Intermediate code is the interface between front end
and back end in a compiler Ideally the details of source language are confined to the front end and the details of target machines to the back end (a m*n model) In this chapter we study intermediate representations, static type checking and intermediate code generation
Parser Static Checker Intermediate Code Generator Code Generator
Front end
Back end
Variants of syntax trees
It is sometimes beneficial to crate a DAG instead of
tree for Expressions. This way we can easily show the common subexpressions and then use that knowledge during code generation Example: a+a*(b-c)+(b-c)*d
+ + *
*
a b c
SDD for creating DAGs
Production 1) E -> E1+T 2) E -> E1-T 3) E -> T 4) T -> (E) 5) T -> id 6) T -> num Example:
1) 2) 3) 4) 5) 6) 7) p1=Leaf(id, entry-a) P2=Leaf(id, entry-a)=p1 p3=Leaf(id, entry-b) p4=Leaf(id, entry-c) p5=Node(-,p3,p4) p6=Node(*,p1,p5) p7=Node(+,p1,p6)
Semantic Rules
E.node= new Node(+, E1.node,T.node) E.node= new Node(-, E1.node,T.node) E.node = T.node T.node = E.node T.node = new Leaf(id, id.entry) T.node = new Leaf(num, num.val)
8) 9) 10) 11) 12) 13) p8=Leaf(id,entry-b)=p3 p9=Leaf(id,entry-c)=p4 p10=Node(-,p3,p4)=p5 p11=Leaf(id,entry-d) p12=Node(*,p5,p11) p13=Node(+,p7,p12)
Value-number method for constructing DAGs
=
+ i 10
id num + 3
To entry for i
10
1 1
2 3
Algorithm Search the array for a node M with label op, left child l and right child r If there is such a node, return the value number M If not create in the array a new node N with label op, left child l, and right child r and return its value We may use a hash table
Three address code
In a three address code there is at most one operator at
the right side of an instruction Example:
+ + * a b c *
t1 = b c t2 = a * t1 t3 = a + t2 t4 = t1 * d t5 = t3 + t4
Forms of three address instructions
x = y op z x = op y x=y goto L if x goto L and ifFalse x goto L if x relop y goto L Procedure calls using:
param x call p,n y = call p,n
x = y[i] and x[i] = y x = &y and x = *y and *x =y
Example
do i = i+1; while (a[i] < v);
L: t1 = i + 1 i = t1 t2 = i * 8 t3 = a[t2] if t3 < v goto L Symbolic labels 100: 101: 102: 103: 104: t1 = i + 1 i = t1 t2 = i * 8 t3 = a[t2] if t3 < v goto 100
Position numbers
Data structures for three address codes
Quadruples Has four fields: op, arg1, arg2 and result Triples Temporaries are not used and instead references to instructions are made Indirect triples In addition to triples we use a list of pointers to triples
Example
b * minus c + b * minus c
Three address code
t1 = minus c t2 = b * t1 t3 = minus c t4 = b * t3 t5 = t2 + t4 a = t5
Quadruples
op arg1 arg2 result minus c t1 * b t1 t2 minus c t3 b t3 t4 * + t2 t4 t5 = t5 a 0 1 2 3 4 5
Triples
op arg1 arg2 minus c * b (0) minus c b (2) * + (1) (3) = a (4)
Indirect Triples
op 35 (0) 36 (1) 37 (2) 38 (3) 39 (4) 40 (5) 0 1 2 3 4 5 op arg1 arg2 minus c * b (0) minus c b (2) * + (1) (3) = a (4)
Type Expressions
Example: int[2][3] array(2,array(3,integer))
A basic type is a type expression A type name is a type expression A type expression can be formed by applying the array type constructor to a number and a type expression. A record is a data structure with named field A type expression can be formed by using the type constructor g for function types If s and t are type expressions, then their Cartesian product s*t is a type expression Type expressions may contain variables whose values are type expressions
Type Equivalence
They are the same basic type. They are formed by applying the same constructor to
structurally equivalent types. One is a type name that denotes the other.
Declarations
Storage Layout for Local Names
Computing types and their widths
Storage Layout for Local Names
Syntax-directed translation of array types
Sequences of Declarations
Actions at the end:
Fields in Records and Classes
Translation of Expressions and Statements
We discussed how to find the types and offset of
variables We have therefore necessary preparations to discuss about translation to intermediate code We also discuss the type checking
Three-address code for expressions
Incremental Translation
Addressing Array Elements
Layouts for a two-dimensional array:
Semantic actions for array reference
Translation of Array References
Nonterminal L has three synthesized attributes: L.addr L.array L.type
Conversions between primitive types in Java
Introducing type conversions into expression evaluation
Abstract syntax tree for the function definition
fun length(x) = if null(x) then 0 else length(tl(x)+1)
This is a polymorphic function in ML language
Inferring a type for the function length
Algorithm for Unification
Unification algorithm
boolean unify (Node m, Node n) { s = find(m); t = find(n); if ( s = t ) return true; else if ( nodes s and t represent the same basic type ) return true; else if (s is an op-node with children s1 and s2 and t is an op-node with children t1 and t2) { union(s , t) ; return unify(s1, t1) and unify(s2, t2); } else if s or t represents a variable { union(s, t) ; return true; } else return false; }
Control Flow
boolean expressions are often used to: Alter the flow of control. Compute logical values.
Short-Circuit Code
Flow-of-Control Statements
Syntax-directed definition
Generating three-address code for booleans
translation of a simple if-statement
Backpatching
Previous codes for Boolean expressions insert symbolic labels for
jumps It therefore needs a separate pass to set them to appropriate addresses We can use a technique named backpatching to avoid this We assume we save instructions into an array and labels will be indices in the array For nonterminal B we use two attributes B.truelist and B.falselist together with following functions:
makelist(i): create a new list containing only I, an index into the array
of instructions Merge(p1,p2): concatenates the lists pointed by p1 and p2 and returns a pointer to the concatenated list Backpatch(p,i): inserts i as the target label for each of the instruction on the list pointed to by p
Backpatching for Boolean Expressions
Backpatching for Boolean Expressions
Annotated parse tree for x < 100 || x > 200 && x ! = y
Flow-of-Control Statements
Translation of a switch-statement
Readings
Chapter 6 of the book