Compiler Code Generation Guide

The document discusses code generation in compiler design. It covers topics like register allocation, instruction selection, and a simple code generation algorithm. The algorithm considers each three-address instruction, decides which registers to use for operands, loads operands from memory if needed, generates the operation instruction, and stores the result back to memory if necessary. It uses register and address descriptors to track which values are in which registers and memory locations.

Uploaded by

bekalu alemayehu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

176 views31 pages

Compiler Code Generation Guide

Uploaded by

bekalu alemayehu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Compiler Design

Instructor: Mohammed O.
Email: momoumer90@[Link]
Samara University
Chapter Ten
This Chapter Covers:
 Code Generation
 Register Allocation
 DAG Representation
Code Generation
The primary objective of the code generator is to convert
syntax trees to instructions.
The final phase in our compiler model is the code generator.
It takes as input an intermediate representation of the
source programme and produces as output an equivalent
target programme.
The requirements traditionally imposed on a code generator
are severe.
The output of a code generator code must be correct and of
high quality and the code generator should run efficiently.
Code Generation (Cont.)

Position of Code Generator

Issues in the Design of a Code Generator

While the details are dependent on the target language
and the operating system, issues such as memory
management, instruction selection, register allocation, and
evaluation order are inherent in almost all code generation
problems.
Input to the Code Generator
The input to the code generator consists of the intermediate
representation of the source programme and information
in the symbol table.
The information in the symbol table will be used to
determine the run time addresses of the data objects
denoted by the names in the intermediate representation.
Intermediate language can be:
postfix notation
three address representations such as quadruples,
virtual machine representations such as syntax trees and
dags.
Target Programmes
The output of the code generator is the target programme.
The output may take on a variety of forms:
absolute machine language,
Re locatable machine language,
assembly language
Producing an absolute machine language programme has the
advantage that it can be placed in a location in memory and
immediately executed.
A small programme can be compiled and executed quickly.
A set of relocatable object modules can be linked together and loaded
for execution by a linking loader.
Producing an assembly language programme as output makes the
process of code generation easier.
Instruction Selection
The nature of the instruction set of the target machine
determines the difficulty of instruction selection.
If the target machine does not support each data type in a
uniform manner, then each exception to the general rule
requires special handling.
Instruction speeds and machine idioms are other important
factors.
If we do not care about the efficiency of the target
programme, instruction selection is straightforward.
For each type of three- address statement we can design a
code skeleton that outlines the target code to be generated
for that construct.
Instruction Selection (Cont.)
For example, every three address statement of the form x:
= y + z, where x, y, and z are statically allocated, can be
translated into the code sequence
MOV y, R0 /* load y into register R0 */
ADD z, R0 /* add z to R0 */
MOV R0, x /* store R0 into x */
But, this kind of statement-by-statement code generation
often produces poor code.
For example, the sequence of statements:
a := b + c
d := a + e
Instruction Selection (Cont.)
would be translated into
MOV b, R0
ADD c, R0
MOV R0, a
+
MOV a, R0
ADD e, R0
MOV R0, d
Here the fourth statement is redundant, and so is the third
if ‘a’ is not subsequently used.
The quality of the generated code is determined by its
speed and size.
Instruction Selection (Cont.)
A target machine with a rich instruction set may provide
several ways of implementing a given operation.
For example if the target machine has an “increment”
instruction (INC), then the three address statement a :=
a+1 may be implemented more efficiently by the single
instruction INC a.
MOV a, R0 | INC a
ADD #1,R0|
MOV R0, a |
Register Allocation
Instructions involving register operands are usually shorter
and faster than those involving operands in memory.
Efficient utilisation of register is important in generating
good code.
The use of registers is often subdivided into two sub-
problems:
1. During register allocation, we select the set of variables
that will reside in registers at a point in the programme.
2. During a subsequent register assignment phase, we pick
the specific register that a variable will reside in.
Finding an optimal assignment of registers to variables is
difficult, even with single register values.
Register Allocation (Cont.)
The problem is further complicated because the
hardware and/or the operating system of the target
machine may require that certain register usage
conventions be observed.
Assignment: Run-Time Storage Management
Static Allocation and Stack Allocation
Basic blocks
A Simple Code Generator
Here let us consider an algorithm that generates code for a
single basic block.
It considers each three-address instruction and keeps track
of what values are in what registers so it can avoid
generating unnecessary loads and stores.
One of the primary issues during code generation is
deciding how to use registers to best advantage.
There are four principal uses of registers:
-In most machine architectures, some or all of the operands
of an operation must be in registers in order to perform the
operation.
A Simple Code Generator (Cont.)
-Registers make good temporaries — places to hold the
result of a subexpression while a larger expression is being
evaluated, or more generally, a place to hold a variable that
is used only within a single basic block.
-Registers are used to hold (global) values that are
computed in one basic block and used in other blocks, for
example, a loop index that is incremented going around the
loop and is used several times within the loop.
-Registers are often used to help with run-time storage
management, for example, to manage the run-time stack,
including the maintenance of stack pointers and possibly
the top elements of the stack itself.
A Simple Code Generator (Cont.)
These are competing needs, since the number of registers
available is limited.
While considering the code generation algorithm, we
assume that the basic block has already been transformed
into a preferred sequence of three-address instructions, by
transformations such as combining common
subexpressions.
We also assume that for each operator, there is exactly one
machine instruction that takes the necessary operands in
registers and performs that operation, leaving the result in
a register.
A Simple Code Generator (Cont.)
The machine instructions are of the form
LD reg, mem
ST mem, reg
OP reg, reg, reg
Register and Address Descriptors
The code-generation algorithm considers each three-
address instruction in turn and decides what load
instructions are necessary to get the needed operands into
registers.
After generating the loads, it generates the operation itself.
Then, if there is a need to store the result into a memory
location, it also generates that store.
In order to make the needed decisions, we require a data
structure that tells us what program variables currently
have their value in a register, and which register or
registers, if so.
Reg and Address Descriptors (Cont.)
We also need to know whether the memory location for a
given variable currently has the proper value for that
variable, since a new value for the variable may have been
computed in a register and not yet stored.
The desired data structure has the following descriptors:
1. For each available register, a register descriptor keeps track
of the variable names whose current value is in that register.
Since we shall use only those registers that are available for
local use within a basic block, we assume that initially, all
register descriptors are empty.
2. For each program variable, an address descriptor keeps
track of the location or locations where the current value of
that variable can be found.
Reg and Address Descriptors (Cont.)
The location might be a register, a memory address,
a stack location, or some set of more than one of
these.
The information can be stored in the symbol-table
entry for that variable name.
The Code-Generation Algorithm
An essential part of the algorithm is a function getReg(I),
which selects registers for each memory location associated
with the three-address instruction I.
Function getReg has access to the register and address
descriptors for all the variables of the basic block, and also
have access to certain information such as the variables
that are live on exit from the block.
In a three-address instruction such as x = y + z, we shall
treat + as a generic operator and ADD as the equivalent
machine instruction.
For a three-address instruction such as x = y + z, do the
following:
Code-Generation Algorithm (Cont.)
1. Use getReg(x = y + z) to select registers for x, y,
and z. Call these Rx, Ry, and Rz.
2. If y is not in Ry (from the register descriptor of
Ry), then issue an instruction LD Ry,y', where y' is
one of the memory locations for y (from the address
descriptor of y).
3. Similarly, if z is not in Rz, issue and instruction LD
Rz,z', where z' is a location for z.
4. Issue the instruction ADD Rx,Ry,Rz.
Machine Instructions for Copy Stat.
There is an important special case: a three-address copy
statement of the form: x = y.
We assume that getReg will always choose the same
register for both x and y.
If y is not already in that register Ry, then generate the
machine instruction LD Ry, y.
If y was already in Ry, we do nothing.
It is only necessary that we adjust the register description
for Ry so that it includes x as one of the values found there.
Ending the Basic Block
Variables used by the block may wind up with their only
location being a register.
If the variable is a temporary used only within the block,
we can forget about the value of the temporary and assume
its register is empty.
If the variable is live on exit from the block, or if we don't
know which variables are live on exit, then we need to
assume that the value of the variable is needed later.
In that case, for each variable x we generate the instruction
ST x, R, where R is a register in which x's value exists at
the end of the block.
Managing Reg and Addr Descriptors
As the code-generation algorithm issues machine
instructions, it needs to update the register and address
descriptors.
The rules are as follows:
1. For the instruction LD R:x
(a) Change the register descriptor for register R so it holds
only x.
(b) Change the address descriptor for x by adding register
R as an additional location.
2. For the instruction ST x, R, change the address
descriptor for x to include its own memory location.
Managing Reg and Addr Des (Cont.)
3. For an operation such as ADD Rx,Ry,Rz (x = y + z)
(a) Change the register descriptor for Rx so that it holds only x.
(b) Change the address descriptor for x so that its only location
is Rx (no more at memory).
(c) Remove Rx from the address descriptor of any variable
other than x.
4. When we process a copy statement x = y, after generating the
load for y into register Ry, if needed, and after managing
descriptors as for all load statements (per rule 1):
(a) Add x to the register descriptor for Ry.
(b) Change the address descriptor for x so that its only location
is Ry.
Managing Reg and Addr Des (Cont.)
Example: Let us translate the basic block consisting of the
three-address statements
t=a-b
u=a-c
v=t+u
a=d
d=v+u
Here we assume that t, u, and v are temporaries, local to the
block, while a, b, c, and d are variables that are live on exit from
the block.
Assume that there are as many registers as we need.
But when a register's value is no longer needed (it holds only a
temporary), then we reuse its register.
Managing Reg and Addr Des (Cont.)
A summary of all the machine-code instructions
generated is shown in the table.
Instructions generated and the changes in the
register and address descriptors.
Managing Reg and Addr Des (Cont.)
Register descriptors Address descriptors
Instructions

Rl R2 R3 a b c d t u v
t=a–b
LD Rl, a
LD R2, b
SUB R2, Rl, R2

a t a,Rl b c d R2
u=a–c
LD R3, c
SUB Rl, Rl, R3

u t c a b c,R3 d R2 Rl
v=t+u
ADD R3, R2, Rl
u t v a b c d R2 Rl R3
a=d
LD R2, d
u a,d v R2 b c d,R2 R1 R3
d=v+u
ADD Rl, R3, Rl
d a v R2 b c Rl R3
Exit
ST a, R2
ST d, Rl

d a v a,R2 b c d,Rl R3
Managing Reg and Addr Des (Cont.)
For the first three-address instruction, t = a-b we need to
issue three instructions, since nothing is in a register
initially.
Thus, we see a and b loaded into registers Rl and R2, and
the value t produced in register R2.
Notice that we can use R2 for t because the value b
previously in R2 is not needed within the block.
The second instruction, u = a - c , does not require a load of
a, since it is already in register Rl.
Further, we can reuse Rl for the result, u, since the value of
a, previously in that register, is no longer needed within the
block, and its value is in its own memory location if a is
needed outside the block.
Managing Reg and Addr Des (Cont.)
Note that we change the address descriptor for a to indicate
that it is no longer in Rl, but is in the memory location called
a.
The third instruction, v = t + u, requires only the addition.
Further, we can use R3 for the result, v, since the value of c
in that register is no longer needed within the block, and c
has its value in its own memory location.
The copy instruction, a = d, requires a load of d, since it is in
memory.
We show register R2's descriptor holding both a and d.
The addition of a to the register descriptor is the result of
our processing the copy statement, and is not the result of
any machine instruction.
Managing Reg and Addr Des (Cont.)
The fifth instruction, d = v + u, uses two values that are in
registers.
Since u is a temporary whose value is no longer needed, we
have chosen to reuse its register Rl for the new value of d.
Notice that d and a are now in only Rl and R2 respectively,
and are not in their own memory location.
Therefore, we need to stores the live-on-exit variables a and
d into their memory locations.

Unit 5
No ratings yet
Unit 5
10 pages
Codegeneration Final
No ratings yet
Codegeneration Final
31 pages
CD Unit 5
No ratings yet
CD Unit 5
26 pages
Code Geneartion
No ratings yet
Code Geneartion
13 pages
Module 6
No ratings yet
Module 6
35 pages
Unit V
No ratings yet
Unit V
42 pages
CD Unit 5
No ratings yet
CD Unit 5
26 pages
CD Unit 5
No ratings yet
CD Unit 5
9 pages
Compiler-Design U5
No ratings yet
Compiler-Design U5
13 pages
Code Generation: Issues in The Design of A Code Generator
No ratings yet
Code Generation: Issues in The Design of A Code Generator
33 pages
Unit 5
No ratings yet
Unit 5
13 pages
Unit 4 PCD
No ratings yet
Unit 4 PCD
15 pages
Code Generation
No ratings yet
Code Generation
5 pages
Code Generation
No ratings yet
Code Generation
49 pages
Unit 5
No ratings yet
Unit 5
13 pages
Code Generation Techniques in Compilers
No ratings yet
Code Generation Techniques in Compilers
51 pages
Compiler Notes KCG Unit IV
No ratings yet
Compiler Notes KCG Unit IV
14 pages
Compiler Notes Unit IV
No ratings yet
Compiler Notes Unit IV
15 pages
Compiler Code Generation Basics
No ratings yet
Compiler Code Generation Basics
13 pages
34-Issues in The Design of A Code Generator - Target Machine-25-10-2024
100% (1)
34-Issues in The Design of A Code Generator - Target Machine-25-10-2024
29 pages
CD Module 3&4
No ratings yet
CD Module 3&4
74 pages
Unit 5 1 Basicblocks
No ratings yet
Unit 5 1 Basicblocks
39 pages
Codegeneration Unit5
No ratings yet
Codegeneration Unit5
4 pages
Unit4 Compiler PDF
No ratings yet
Unit4 Compiler PDF
73 pages
Unit-4-5
No ratings yet
Unit-4-5
36 pages
Module 4
No ratings yet
Module 4
80 pages
Code Generation for CS Students
No ratings yet
Code Generation for CS Students
15 pages
Code Generation in Compiler Design
No ratings yet
Code Generation in Compiler Design
32 pages
Code Generation Issues and Solutions
No ratings yet
Code Generation Issues and Solutions
54 pages
Compiler Design and Construction Lecture Notes
No ratings yet
Compiler Design and Construction Lecture Notes
28 pages
Code Generation Design Issues
No ratings yet
Code Generation Design Issues
19 pages
Compiler Design (Unit-5)
No ratings yet
Compiler Design (Unit-5)
22 pages
Code Generation
No ratings yet
Code Generation
40 pages
Module 6 - Code Generation
No ratings yet
Module 6 - Code Generation
36 pages
Unit 5
No ratings yet
Unit 5
8 pages
Unit 5 Part 1 - CD
No ratings yet
Unit 5 Part 1 - CD
14 pages
Code Generation (Autosaved)
No ratings yet
Code Generation (Autosaved)
48 pages
Compiler Design Code Generation
No ratings yet
Compiler Design Code Generation
4 pages
Code Generation
No ratings yet
Code Generation
21 pages
Acd 5
No ratings yet
Acd 5
9 pages
Compiler Code Generation Basics
No ratings yet
Compiler Code Generation Basics
6 pages
Code Generation F
No ratings yet
Code Generation F
7 pages
Code Generator Design Challenges
No ratings yet
Code Generator Design Challenges
4 pages
Code Generation 5th Year Computer Science Course
No ratings yet
Code Generation 5th Year Computer Science Course
20 pages
Unit V
No ratings yet
Unit V
21 pages
Code Generation and Optimization in Compilers
No ratings yet
Code Generation and Optimization in Compilers
16 pages
Code Optimization and Generation Principles
No ratings yet
Code Optimization and Generation Principles
58 pages
Compiler Design - Unit 5 NOTES
No ratings yet
Compiler Design - Unit 5 NOTES
28 pages
Code Generation
No ratings yet
Code Generation
25 pages
Compiler Unit 5 Notes
No ratings yet
Compiler Unit 5 Notes
20 pages
CD Unit-6 LM
No ratings yet
CD Unit-6 LM
17 pages
Compiler Design: Code Generation & Optimization
No ratings yet
Compiler Design: Code Generation & Optimization
11 pages
Darshan Sem7 170701 CD 2014
No ratings yet
Darshan Sem7 170701 CD 2014
81 pages
Code Generation and Optimization
0% (1)
Code Generation and Optimization
42 pages
Code Generation in Compiler Design
No ratings yet
Code Generation in Compiler Design
6 pages
Code Opti
No ratings yet
Code Opti
26 pages
15Cs314J - Compiler Design: Unit 4
No ratings yet
15Cs314J - Compiler Design: Unit 4
71 pages
Code Generation and Storage Strategies
No ratings yet
Code Generation and Storage Strategies
9 pages
HW11數學規劃
No ratings yet
HW11數學規劃
14 pages
Understanding OOPs Concepts and Features
No ratings yet
Understanding OOPs Concepts and Features
6 pages
Data Reshaping Techniques in R
No ratings yet
Data Reshaping Techniques in R
13 pages
Mod Menu Log - Com - Carxtech.sr
No ratings yet
Mod Menu Log - Com - Carxtech.sr
161 pages
C Programming: Arrays and Assignments
No ratings yet
C Programming: Arrays and Assignments
6 pages
21CS42 Daa m5 Notes
No ratings yet
21CS42 Daa m5 Notes
29 pages
CPU Architectures and Microcontroller Overview
No ratings yet
CPU Architectures and Microcontroller Overview
8 pages
11th STD Computer Science em Sample Materials
No ratings yet
11th STD Computer Science em Sample Materials
104 pages
Java Basics for Beginners Guide
No ratings yet
Java Basics for Beginners Guide
38 pages
Counter
No ratings yet
Counter
3 pages
Compiler Design Final Question Bank
No ratings yet
Compiler Design Final Question Bank
5 pages
COMSATS University Islamabad, Lahore Campus Sessional-I Lab Examination - Spring 2021
No ratings yet
COMSATS University Islamabad, Lahore Campus Sessional-I Lab Examination - Spring 2021
2 pages
VLSI Guru Test-Sample-Questions - VLSI Guru
100% (1)
VLSI Guru Test-Sample-Questions - VLSI Guru
4 pages
35.ICS 2201OOP II - Paper I
No ratings yet
35.ICS 2201OOP II - Paper I
4 pages
Dsa Roadmap
No ratings yet
Dsa Roadmap
1 page
K-PREP Placement Prep Session 2.0 by Ankit Kumar
No ratings yet
K-PREP Placement Prep Session 2.0 by Ankit Kumar
9 pages
APDS Data Dictionary V1.0 - Public Comment
No ratings yet
APDS Data Dictionary V1.0 - Public Comment
123 pages
Example Thesis Title For Computer Science
100% (3)
Example Thesis Title For Computer Science
6 pages
Ap Computer Science Principles Ultimate Review Guide
No ratings yet
Ap Computer Science Principles Ultimate Review Guide
50 pages
GATE Progress Tracker
No ratings yet
GATE Progress Tracker
3 pages
Mastery On Linux
No ratings yet
Mastery On Linux
6 pages
CnCadOpti Eng
No ratings yet
CnCadOpti Eng
14 pages
Bodhicariya Senior Secondary School Spilt Up Syllabus For The Year 2025-26 Class - XI - Subject - Computer
No ratings yet
Bodhicariya Senior Secondary School Spilt Up Syllabus For The Year 2025-26 Class - XI - Subject - Computer
3 pages
Graph Algorithms CPP Handbook
No ratings yet
Graph Algorithms CPP Handbook
14 pages
Mathematical Statistics and Data Analysis 3rd Edition - Chapter7 Solutions PDF
100% (13)
Mathematical Statistics and Data Analysis 3rd Edition - Chapter7 Solutions PDF
23 pages
Code Kata Solutions
No ratings yet
Code Kata Solutions
4 pages
3rd Sem Result Analysis
No ratings yet
3rd Sem Result Analysis
2 pages
Isomorphic Graphs and Adjacency Matrix
No ratings yet
Isomorphic Graphs and Adjacency Matrix
10 pages
COMPUTER STUDIES PAPER 2 MOCK Mkey
No ratings yet
COMPUTER STUDIES PAPER 2 MOCK Mkey
12 pages
Detailed LLD Roadmap
No ratings yet
Detailed LLD Roadmap
2 pages

Compiler Code Generation Guide

Uploaded by

Compiler Code Generation Guide

Uploaded by

Compiler Design

Position of Code Generator

Issues in the Design of a Code Generator

You might also like