0% found this document useful (0 votes)
27 views

Code Generation

Uploaded by

lyeabsra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Code Generation

Uploaded by

lyeabsra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Code Generation

By Birku L.
 Introduction
 Issues in the Design of a Code Generator
 Instruction Selection
 Register Allocation and Assignment
 Instruction ordering
 A Simple Target Machine Model
 A Simple Code Generator : Code Generation for Trees
 Addresses in the Target Code
 Basic Blocks and Flow Graphs
 Optimization of Basic Blocks
 Liveness Analysis
 Requirements
 Preserve semantic meaning of source program
 Make effective use of available resources of target machine
 Code generator itself must run efficiently
 Challenges
 Problem of generating optimal target program is undecidable
 Many subproblems encountered in code generation are
computationally intractable
Issues in designing code generators

 Input to code generation

 Target program(output)

 Memory management

 Instruction selection

 Register allocation

 Evaluation order
 Inputs – are IRs or simply intermediate code in the
forms of:-
 Graphical presentation (syntax trees, DAGs,…)
 Linear presentation (postfix, …)
 three-address presentations (quadruples, triples, …)
 Virtual machine presentations (bytecode, stack-machine, …)
 Outputs - depending on the requirements
 Absolute machine code (executable code by the loader)
 Relocatable machine code (object files for linker)
 Assembly language (facilitates debugging)
 Byte code forms for interpreters (e.g. JVM)
 Three primary tasks
 Instruction selection
 choose appropriate target-machine instructions to implement
the IR statements
 Register allocation and assignment
 decide what values to keep in which registers
 Instruction ordering
 decide in what order to schedule the execution of instructions
 Design of all code generators involve the above three tasks
 Details of code generation are dependent on the specifics of
IR, target language, and run-time system
Instruction selection
 The complexity of mapping IR program into code-sequence for target
machine depends on:
 Level of IR (high-level or low-level)
 If IR is high-level - each IR statement is translated into a sequence of
machine instructions. Poor code produced – requires optimization
 If IR is low level - low-level details of the underlying machine can be
used to generate more efficient code sequences
 Nature of instruction set (data type support)
 E.g. the uniformity and completeness of the instruction set are
important factors
 Desired quality of generated code (speed and size)
 If efficiency of the target program were not an issue, instruction
selection is straightforward
Example
 x= y+z this is three code address representation
Mov y,Ro
ADD z,Ro
Mov R0,x
 a=a+1
Mov a,R0
ADD#1,R0 INC a
Mov Ro,a
 a=b+c Mov b,R0 ADD c,R0 Mov R0,a
 d=a+c Mov a,R0 ADD c, R0 Mov R0,d
From the above two moving instruction are similar so it will take the
first one only.
Mov b,R0 Mov c,R0 ADD c,R0 Mov R0,d
This one is more efficient relative to the first one
Instruction selection …
 Given IR it can be implemented by many different code sequences,
with significant cost differences between the different
implementations.
 Example: lets take statement a = a+1
 It can be implemented in either of the following ways
 INC a // if the target machine has an "increment" instruction (INC)
OR
 LOAD RO, a // RO = a
 ADD RO, RO, #1 // RO = RO + 1
 STORE a, RO // a = RO
 Which code is efficient?
 Of course, We need to know instruction costs in order to design good
code sequences
Register allocation
 We apply it when we want to perform more number of
operation and we have limited numbers of register.
 So how can we allocate with limited number of register more
number of operation?
It performed by two ways
 Register allocation:- it specify which register contains which
variable let registers R0, R1 R0 contain which variable and
R1 contain which variable
 Register assignment:- it is the opposite of register allocation
which means which variable contain which register like a
contains R0 and b contains R1
Register Allocation
 A key problem in code generation is deciding what values to hold
in what registers
 Registers are the fastest computational unit, but we usually do not
have enough of them to hold all values
 The use of registers is often subdivided into two sub problems:
 Register allocation, during which we select the set of variables that
will reside in registers at each point in the program.
 Register assignment, during which we pick the specific register that
a variable will reside in.
 Finding an optimal assignment of registers to variables is difficult,
even with single-register machines.
 The problem is NP-complete
 Have to follow register usage of hardware & Operating Systems
Register Allocation…
 What variables can the allocator try to put in registers?
 Temporary variables: easy to allocate
 defined & used exactly once, during expression evaluation
implies allocator can free up register when done
 usually not too many in use at one time implies less likely to run
out of registers
 Local variables: hard, but doable
 need to determine last use of variable in order to free register
 can easily run out of registers implies need to make decision
about which variables get register allocation
 Global variables
 really hard, but doable as a research project
Cont…
Example
t=a+b
t=t*c Mov a,R0
ADD b,R0
t=t/d Mov R0,t
Mul c,R0
Mov a,R0 Div d,R0
ADD b,R0 Mov R0,t

Mul c,R0 Select the efficient one
Div d,Ro
Mov Ro,t
Evaluation Order
 Is selecting the order in which computations are performed
 Affects the efficiency of the target code - some computation orders
require fewer registers to hold intermediate results than others.
 Picking a best order is NP-complete
 Example
 Implementing code generator requires through understanding
of the target machine architecture and its instruction set
 Our (hypothetical) machine:
 Byte-addressable (word = 4 bytes)
 Has n general purpose registers R0, R1, …, Rn-1
 Assume all operands are integers
 Two-address(load , store) instructions of the form
op source, destination
 Where Op is operation code and Source, destination are data
fields
 Computation operations: OP dst, srcl, src2
 Where OP is a operator like ADD or SUB, and dst, srcl , and
src2 are locations, not necessarily distinct.
 Unconditional jumps: BR L where L is label
 Conditional jumps of the form Bcond r, L
 where r is a register, L is a label
 A variety of addressing modes
 variable name
 a(r) means contents(a + contents(r))
 *a(r) means: contents(contents(a + contents(r)))
 immediate: #constant (e.g. LD R1, #100)
 Cost
 cost of an instruction = 1 + cost of operands
 cost of register operand = 0
 cost involving memory and constants = 1
 cost of a program = sum of instruction costs
…
Code Generation for Trees
 Generate an assembly code, for the expression
(A-B)+((C+D)+(E*F))
 The expression corresponds to he following
AST and assembly code is
 We used only two register
LD R1, C
 There is an algorithm that generates code ADD R1, D
with the least number of registers. It is LD R0, E
called the Sethi-Ullman algorithm. MULT R0, F
 It consists of two phases: numbering and
ADD R1, R0
LOAD R0, A
code generation.
SUB R0, B
ADD R0, R1
Simple code generator
 It Generate target code for the sequence of three address
statements.
 It used getReg function to assign registers to a variables
 For each operator in the statement there is corresponding
target code operators.
 The two data structure of Simple code generator
 Register and address descriptors
 Register descriptor
 It used to keeps the track of which variable is stored in a
register . Initially all registers are empty.
 Address descriptor
 It used to keeps track of location where variable is stored.
Location may be register a stack location or memory address.
Code generation algorithm
 For the given three address code x=y op z
 Invoke the function getReg to determine the location L where
result of y op z store. Where l may be register/memory
 Consult the address descriptor for y to determine y’, the
current location of y. If y is not already in L, generate MOV
y’,L.
 Generate the instruction op z’,L update address descriptor of
x to indicate that x is in L. if L is register update its descriptor
to indicate that it contains the value of x
 If y and z have no next use and not live on exit update the
descriptor to remove y & z.
Example
 d= (a-b)+(a-c)+(a-c)
 The three address code representation is
t1=a-b t2=a-c t3=t1+t2 d=t3+t2
Only two registers are used
Statement code Register descriptor Address descriptor
Generated
t1=a-b Mov a,R0 Register are empty t1 in R0
Sub b,R0 Ro contain t1
t2=a-c Mov a,R1 Ro contain t1 t1 in R0
Sub c,R R1 contain t2 t2 in R1

t3=t1+t2 ADD R1,R0 Ro contain t3 t2 in R1


R1 contain t2 t3 in R0

d=t3+t2 ADD R1,R0 Ro contain d d in R0


Mov R0,d d in R0 and memory
 4 distinct Regions of Memory
 Code space – Instructions to be executed
 Best if read-only
 Static (or Global)
 Variables that retain their value over the lifetime of the
program
 Stack
 Variables that is only as long as the block within which they are
defined (local)
 Heap
 Variables that are defined by calls to the system storage
allocator (malloc, new)
 Code and static data
sizes determined by the
Code
compiler

Static Data
 Stack and heap sizes vary
at run-time
Stack
 Stack grows downward
 Heap grows upward

Heap  Some machines have


stack/heap switched
 Standard (simple) approach
 Globals/statics – memory
 Locals
 Composite types (structs, arrays, etc.) – memory
 Scalars
 Most frequently used – register
 Rest – memory

 All memory approach


 Put all variables into memory
 Compiler register allocation relocates some memory variables to
registers later
Thank you
Your final will [3-7]

You might also like