0% found this document useful (0 votes)
176 views31 pages

Compiler Code Generation Guide

The document discusses code generation in compiler design. It covers topics like register allocation, instruction selection, and a simple code generation algorithm. The algorithm considers each three-address instruction, decides which registers to use for operands, loads operands from memory if needed, generates the operation instruction, and stores the result back to memory if necessary. It uses register and address descriptors to track which values are in which registers and memory locations.

Uploaded by

bekalu alemayehu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views31 pages

Compiler Code Generation Guide

The document discusses code generation in compiler design. It covers topics like register allocation, instruction selection, and a simple code generation algorithm. The algorithm considers each three-address instruction, decides which registers to use for operands, loads operands from memory if needed, generates the operation instruction, and stores the result back to memory if necessary. It uses register and address descriptors to track which values are in which registers and memory locations.

Uploaded by

bekalu alemayehu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Compiler Design

Instructor: Mohammed O.
Email: momoumer90@[Link]
Samara University
Chapter Ten
This Chapter Covers:
 Code Generation
 Register Allocation
 DAG Representation
Code Generation
The primary objective of the code generator is to convert
syntax trees to instructions.
The final phase in our compiler model is the code generator.
It takes as input an intermediate representation of the
source programme and produces as output an equivalent
target programme.
The requirements traditionally imposed on a code generator
are severe.
The output of a code generator code must be correct and of
high quality and the code generator should run efficiently.
Code Generation (Cont.)

Position of Code Generator

Issues in the Design of a Code Generator


While the details are dependent on the target language
and the operating system, issues such as memory
management, instruction selection, register allocation, and
evaluation order are inherent in almost all code generation
problems.
Input to the Code Generator
The input to the code generator consists of the intermediate
representation of the source programme and information
in the symbol table.
The information in the symbol table will be used to
determine the run time addresses of the data objects
denoted by the names in the intermediate representation.
Intermediate language can be:
postfix notation
three address representations such as quadruples,
virtual machine representations such as syntax trees and
dags.
Target Programmes
The output of the code generator is the target programme.
The output may take on a variety of forms:
absolute machine language,
Re locatable machine language,
assembly language
Producing an absolute machine language programme has the
advantage that it can be placed in a location in memory and
immediately executed.
A small programme can be compiled and executed quickly.
A set of relocatable object modules can be linked together and loaded
for execution by a linking loader.
Producing an assembly language programme as output makes the
process of code generation easier.
Instruction Selection
The nature of the instruction set of the target machine
determines the difficulty of instruction selection.
If the target machine does not support each data type in a
uniform manner, then each exception to the general rule
requires special handling.
Instruction speeds and machine idioms are other important
factors.
If we do not care about the efficiency of the target
programme, instruction selection is straightforward.
For each type of three- address statement we can design a
code skeleton that outlines the target code to be generated
for that construct.
Instruction Selection (Cont.)
For example, every three address statement of the form x:
= y + z, where x, y, and z are statically allocated, can be
translated into the code sequence
MOV y, R0 /* load y into register R0 */
ADD z, R0 /* add z to R0 */
MOV R0, x /* store R0 into x */
But, this kind of statement-by-statement code generation
often produces poor code.
For example, the sequence of statements:
a := b + c
d := a + e
Instruction Selection (Cont.)
would be translated into
MOV b, R0
ADD c, R0
MOV R0, a
+
MOV a, R0
ADD e, R0
MOV R0, d
Here the fourth statement is redundant, and so is the third
if ‘a’ is not subsequently used.
The quality of the generated code is determined by its
speed and size.
Instruction Selection (Cont.)
A target machine with a rich instruction set may provide
several ways of implementing a given operation.
For example if the target machine has an “increment”
instruction (INC), then the three address statement a :=
a+1 may be implemented more efficiently by the single
instruction INC a.
MOV a, R0 | INC a
ADD #1,R0|
MOV R0, a |
Register Allocation
Instructions involving register operands are usually shorter
and faster than those involving operands in memory.
Efficient utilisation of register is important in generating
good code.
The use of registers is often subdivided into two sub-
problems:
1. During register allocation, we select the set of variables
that will reside in registers at a point in the programme.
2. During a subsequent register assignment phase, we pick
the specific register that a variable will reside in.
Finding an optimal assignment of registers to variables is
difficult, even with single register values.
Register Allocation (Cont.)
The problem is further complicated because the
hardware and/or the operating system of the target
machine may require that certain register usage
conventions be observed.
Assignment: Run-Time Storage Management
Static Allocation and Stack Allocation
Basic blocks
A Simple Code Generator
Here let us consider an algorithm that generates code for a
single basic block.
It considers each three-address instruction and keeps track
of what values are in what registers so it can avoid
generating unnecessary loads and stores.
One of the primary issues during code generation is
deciding how to use registers to best advantage.
There are four principal uses of registers:
-In most machine architectures, some or all of the operands
of an operation must be in registers in order to perform the
operation.
A Simple Code Generator (Cont.)
-Registers make good temporaries — places to hold the
result of a subexpression while a larger expression is being
evaluated, or more generally, a place to hold a variable that
is used only within a single basic block.
-Registers are used to hold (global) values that are
computed in one basic block and used in other blocks, for
example, a loop index that is incremented going around the
loop and is used several times within the loop.
-Registers are often used to help with run-time storage
management, for example, to manage the run-time stack,
including the maintenance of stack pointers and possibly
the top elements of the stack itself.
A Simple Code Generator (Cont.)
These are competing needs, since the number of registers
available is limited.
While considering the code generation algorithm, we
assume that the basic block has already been transformed
into a preferred sequence of three-address instructions, by
transformations such as combining common
subexpressions.
We also assume that for each operator, there is exactly one
machine instruction that takes the necessary operands in
registers and performs that operation, leaving the result in
a register.
A Simple Code Generator (Cont.)
The machine instructions are of the form
LD reg, mem
ST mem, reg
OP reg, reg, reg
Register and Address Descriptors
The code-generation algorithm considers each three-
address instruction in turn and decides what load
instructions are necessary to get the needed operands into
registers.
After generating the loads, it generates the operation itself.
Then, if there is a need to store the result into a memory
location, it also generates that store.
In order to make the needed decisions, we require a data
structure that tells us what program variables currently
have their value in a register, and which register or
registers, if so.
Reg and Address Descriptors (Cont.)
We also need to know whether the memory location for a
given variable currently has the proper value for that
variable, since a new value for the variable may have been
computed in a register and not yet stored.
The desired data structure has the following descriptors:
1. For each available register, a register descriptor keeps track
of the variable names whose current value is in that register.
Since we shall use only those registers that are available for
local use within a basic block, we assume that initially, all
register descriptors are empty.
2. For each program variable, an address descriptor keeps
track of the location or locations where the current value of
that variable can be found.
Reg and Address Descriptors (Cont.)
The location might be a register, a memory address,
a stack location, or some set of more than one of
these.
The information can be stored in the symbol-table
entry for that variable name.
The Code-Generation Algorithm
An essential part of the algorithm is a function getReg(I),
which selects registers for each memory location associated
with the three-address instruction I.
Function getReg has access to the register and address
descriptors for all the variables of the basic block, and also
have access to certain information such as the variables
that are live on exit from the block.
In a three-address instruction such as x = y + z, we shall
treat + as a generic operator and ADD as the equivalent
machine instruction.
For a three-address instruction such as x = y + z, do the
following:
Code-Generation Algorithm (Cont.)
1. Use getReg(x = y + z) to select registers for x, y,
and z. Call these Rx, Ry, and Rz.
2. If y is not in Ry (from the register descriptor of
Ry), then issue an instruction LD Ry,y', where y' is
one of the memory locations for y (from the address
descriptor of y).
3. Similarly, if z is not in Rz, issue and instruction LD
Rz,z', where z' is a location for z.
4. Issue the instruction ADD Rx,Ry,Rz.
Machine Instructions for Copy Stat.
There is an important special case: a three-address copy
statement of the form: x = y.
We assume that getReg will always choose the same
register for both x and y.
If y is not already in that register Ry, then generate the
machine instruction LD Ry, y.
If y was already in Ry, we do nothing.
It is only necessary that we adjust the register description
for Ry so that it includes x as one of the values found there.
Ending the Basic Block
Variables used by the block may wind up with their only
location being a register.
If the variable is a temporary used only within the block,
we can forget about the value of the temporary and assume
its register is empty.
If the variable is live on exit from the block, or if we don't
know which variables are live on exit, then we need to
assume that the value of the variable is needed later.
In that case, for each variable x we generate the instruction
ST x, R, where R is a register in which x's value exists at
the end of the block.
Managing Reg and Addr Descriptors
As the code-generation algorithm issues machine
instructions, it needs to update the register and address
descriptors.
The rules are as follows:
1. For the instruction LD R:x
(a) Change the register descriptor for register R so it holds
only x.
(b) Change the address descriptor for x by adding register
R as an additional location.
2. For the instruction ST x, R, change the address
descriptor for x to include its own memory location.
Managing Reg and Addr Des (Cont.)
3. For an operation such as ADD Rx,Ry,Rz (x = y + z)
(a) Change the register descriptor for Rx so that it holds only x.
(b) Change the address descriptor for x so that its only location
is Rx (no more at memory).
(c) Remove Rx from the address descriptor of any variable
other than x.
4. When we process a copy statement x = y, after generating the
load for y into register Ry, if needed, and after managing
descriptors as for all load statements (per rule 1):
(a) Add x to the register descriptor for Ry.
(b) Change the address descriptor for x so that its only location
is Ry.
Managing Reg and Addr Des (Cont.)
Example: Let us translate the basic block consisting of the
three-address statements
t=a-b
u=a-c
v=t+u
a=d
d=v+u
Here we assume that t, u, and v are temporaries, local to the
block, while a, b, c, and d are variables that are live on exit from
the block.
Assume that there are as many registers as we need.
But when a register's value is no longer needed (it holds only a
temporary), then we reuse its register.
Managing Reg and Addr Des (Cont.)
A summary of all the machine-code instructions
generated is shown in the table.
Instructions generated and the changes in the
register and address descriptors.
Managing Reg and Addr Des (Cont.)
Register descriptors Address descriptors
Instructions

Rl R2 R3 a b c d t u v
t=a–b
LD Rl, a
LD R2, b
SUB R2, Rl, R2

a t a,Rl b c d R2
u=a–c
LD R3, c
SUB Rl, Rl, R3

u t c a b c,R3 d R2 Rl
v=t+u
ADD R3, R2, Rl
u t v a b c d R2 Rl R3
a=d
LD R2, d
u a,d v R2 b c d,R2 R1 R3
d=v+u
ADD Rl, R3, Rl
d a v R2 b c Rl R3
Exit
ST a, R2
ST d, Rl

d a v a,R2 b c d,Rl R3
Managing Reg and Addr Des (Cont.)
For the first three-address instruction, t = a-b we need to
issue three instructions, since nothing is in a register
initially.
Thus, we see a and b loaded into registers Rl and R2, and
the value t produced in register R2.
Notice that we can use R2 for t because the value b
previously in R2 is not needed within the block.
The second instruction, u = a - c , does not require a load of
a, since it is already in register Rl.
Further, we can reuse Rl for the result, u, since the value of
a, previously in that register, is no longer needed within the
block, and its value is in its own memory location if a is
needed outside the block.
Managing Reg and Addr Des (Cont.)
Note that we change the address descriptor for a to indicate
that it is no longer in Rl, but is in the memory location called
a.
The third instruction, v = t + u, requires only the addition.
Further, we can use R3 for the result, v, since the value of c
in that register is no longer needed within the block, and c
has its value in its own memory location.
The copy instruction, a = d, requires a load of d, since it is in
memory.
We show register R2's descriptor holding both a and d.
The addition of a to the register descriptor is the result of
our processing the copy statement, and is not the result of
any machine instruction.
Managing Reg and Addr Des (Cont.)
The fifth instruction, d = v + u, uses two values that are in
registers.
Since u is a temporary whose value is no longer needed, we
have chosen to reuse its register Rl for the new value of d.
Notice that d and a are now in only Rl and R2 respectively,
and are not in their own memory location.
Therefore, we need to stores the live-on-exit variables a and
d into their memory locations.

You might also like