0% found this document useful (0 votes)
62 views57 pages

Introduction to Compiler Construction

Uploaded by

dabaci3568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views57 pages

Introduction to Compiler Construction

Uploaded by

dabaci3568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Compiler

Construction
1 – Introduction to Compiling.
Compilers.
 A Compiler is a program that reads a program
written in one language – the source language
– and translates it into an equivalent program in
another language – the target language.
 During translation process, the compiler reports
to its user the presence of errors in the source
program.
Source Target
Program Compiler Program

Error
Messages
Compilers.
 Source language can be any high level computer programming
language ranging from traditional programming language such as
Fortran, C, Java etc to specialized language that have been written
for a specific area of computer application such as LISP for AI etc.
 Target language may be another programming language (assembly
language) or the machine language of a computer, depending upon
the compiler.
High-level
High-levelsource
sourcecode
code

Compiler

Low-level
Low-levelmachine
machinecode
code
Compilation Process.
 It takes the whole program at a time and either
displays all of the possible errors in the program
or creates an object program.
 The time at which the conversion of a source
program to an object program occurs is called
compile time.
 The object program is executed at run time.
Goals
 Understand the structure of a compiler .

 Understand how the components operate.

 Understand the tools involved.

 To have a better idea about he modular


programming.
Why study Compiler?
 Better Understanding of programming language
concepts.

 Practical application of theory.

 How code is processed by a machine and how


it can be optimized.

 Understanding about memory management


techniques.
Interpreter.
 Interpreter is also used for the translation of high
level language programs.
 It is different from the compilers in a sense that:
 It translates a program by taking one
instruction at a time and produces the results
before taking the next instruction.
 It can identify only one error at a time.
 It does not produces the object program.
Interpreter.
Assembler.
 Assembler is a translator (software) that
particularly converts a program written in
assembly language into machine language.
 Assembly language is called low-level language.
 Because there is one to one correspondence
between the assembly language statements
and machine language statements.
Assembler Cont..
Assembler Cont..
 Assemblers are further divided into two types:
 One Pass Assembler and
 Two Pass Assembler.
 One pass assembler is the assembler which
assigns the memory addresses to the variables
and translates the source code into machine
code in the first pass simultaneously
Assembler Cont..
 . A Two Pass Assembler is the assembler which
reads the source code twice.
 In the first pass, it reads all the variables and
assigns them memory addresses.
 In the second pass, it reads the source code and
translates the code into object code.
Preprocessor
Preprocessor Cont..
Linkers
 A Linker combines Object Code (Machine Code
that has not yet been linked) produced from
compiling many source programs and
 Standard library functions.
 Resolving references in each object file to
external (a)Variables (b)Function/Module.
Declare in other files.
Loader
 a loader is the part of an operating system that is
responsible for loading programs and libraries.
 It is one of the essential stages in the process of
starting a program, as it places programs into
memory and prepares them for execution.
Loader Cont..
 Loading a program involves reading the
contents of the executable file into memory to
prepare the executable for running .

 Once loading is complete, the operating system


starts the program by passing control to the
loaded program code.

 All operating systems that support program


loading have loaders.
The Analysis and Synthesis Model of Compilation.
 There are two parts to compilation:
 Analysis.
 Synthesis.
 The analysis part breaks up the source program
into constituent pieces and creates an
intermediate representation of the source
program.
 The synthesis part constructs the desired target
program from the intermediate representation.
The Analysis and Synthesis Model of Compilation.

Source Program

Analysis
AnalysisPart
Part

Intermediate Representation

Synthesis
SynthesisPart
Part

Target Program
The Context of a Compiler.
 In addition to compiler, several other programs
may be required to create an executable target
program.
 A source program may be divided into
modules stored in separate files. The task of
collecting the source program is the
responsibility of another program called
preprocessor.
 The target program created by the compiler
may require further processing before it can
be run.
The Context of a Compiler.
 The compiler creates the assembly code that
is translated by an assembler into machine
code.
 The linker together the machine code with
some library routines into the code that
actually run on the machine.
The Context of a Compiler.
The Phases of a Compiler.
 A compiler operates in phases, each of which transforms
the source program from one representation into
another.
 In practice, some of the phases may be grouped
together.
 A compiler consists of six phases:
 Lexical Analysis.
 Syntax Analysis.
Analysis Portion
 Semantic Analysis.
 Intermediate Code Generation.
 Code Optimizer.
Synthesis Portion
 Code Generation.
 But two other activities, Symbol-Table Management and
Error Handling, that interact with the six phases are also
informally considered as phases.
Lexical Analysis.
 It is also called Linear Analysis or Scanner.
 It reads the stream of characters making up the
source program from left-to-right and grouped
into tokens (the sequence of characters having a
collective meaning).
 For example, the characters in the assignment
statement:
position = initial + rate * 60
would be read into the following tokens.
Lexical Analysis.
 Tokens:
1. The identifier position.
2. The assignment symbol =.
3. The identifier initial.
4. The plus sign +.
5. The identifier rate.
6. The multiplication sign *.
7. The number 60.
 The blanks separating the characters of these
tokens would normally be eliminated during
lexical analysis.
Example.
Syntax Analysis.
 It is also called Parsing .
 It involves grouping of the tokens of the source
program into grammatical phrases.
 The grammatical phrases of the source program
are represented by a parse tree/syntax tree.
Syntax Analysis.
 The hierarchical structure of a program is expressed by
recursive rules.
 For example, the rules for the definition of expression are:
1. Any identifier is an expression.
2. Any number is an expression.
3. If expression1 and expression2 are expression, then so are
1. expression1 * expression2
2. expression1 + expression2
3. ( expression1 )
 Thus by rule (1) initial and rate are expressions.
 By rule (2) 60 is an expression.
 By rule (3), we can first infer that rate * 60 is an
expression and finally that initial + rate * 60 is an
expression.
Example.
Semantic Analysis.
 The function of the semantic analyzer is to determine the
meaning of the source program.
 It checks the source program for semantic errors.
 It uses the parse tree/syntax tree produced by the syntax
analysis phase to identify the operators and operands of
the statements.
 The semantic analysis performs type checking.
 Here the compiler checks that each operands has
operands that are permitted by the source language
specification.
Semantic Analysis.
 For example, many programming language
definitions require a compiler to report an
error every time a real number is used to
index an array.
 However, many language specification permit
some mismatch operands .
 When a binary arithmetic operator is applied to an
integer and real. The compiler may need to convert
an integer to a real.
Semantic Analysis Cont..
Semantic Analysis.
Intermediate Code Generation.
 After semantic analysis, some compilers
generates an explicit intermediate representation
of the source program (byte Code or CIL for
java and .NET platform respectively).
 An intermediate representation is a program for
an abstract machine.
 An intermediate representation should have two
important properties:
 It should be easy to produce.
 It should be easy to translate into to the target
program.
Intermidiate Code Cont..

Intermediate code can be either language


specific (e.g., Byte Code for Java) or
language independent (three-address code).
Intermediate Code Generation.
 Intermediate representation can have a variety
of forms and one of the is the “three-address
space”.
 Three-address space is like the assembly
language which consists of a sequence of
instructions, each of which has at most three
operands.
 Each three-address space has at most one
operator in addition to the assignment.
 The instructions should be in the order in
which the compiler has to decide that in which
order operations are to be done.
Intermediate Code Generation.
 The multiplication precedes the addition in the
source program.
 The compiler must generate a temporary
variable to hold the value computed by each
instruction.
 Some “three-address space” instructions have
fewer than three operands.
Intermediate Code Generation.
Optimization
What is Optimization?
 A program transformation technique, which
tries to improve the code by making it
consume less resources (i.e. CPU, Memory)
and deliver high speed.
 A code optimizing process must follow the
three rules given below:
Optimization Cont..
 The output code must not, in any way, change
the meaning of the program.
 Optimization should increase the speed of the
program and if possible, the program should
demand less number of resources.
 Optimization should itself be fast and should
not delay the overall compiling process.
Code Optimization.
 Its main objective is to produce more efficient
object/target program.
 The compilers, that do the most called “optimizing
compilers” a significant fraction of the time of the
compiler is spent on this phase.
 Therefore code optimization and compilation time are
inversely proportional to each other.
Code Optimization.
Code Generation.
 The final phase of the compiler is the generation
of the target program, consisting of normally
machine code or assembly code.
 Memory locations are selected for each of the
variable used by the program. Then,
intermediate instructions are each translated in
to the sequence of machine instructions that
perform the same task.
Code Generation Cont..
 A code generator is expected to have an
understanding of the target machine’s runtime
environment and its instruction set. The code
generator should take the following things into
consideration to generate the code:
 Target language : The code generator has to
be aware of the nature of the target language
for which the code is to be transformed. The
target machine can have either CISC or RISC
processor architecture.
Code Generation Cont..
 Register allocation : A program has a number
of values to be maintained during the
execution. The target machine’s architecture
may not allow all of the values to be kept in
the CPU memory or registers. Code generator
decides what values to keep in the registers.
Also, it decides the registers to be used to keep
these values.
Code Generation Cont..
Errors
A program may have the following kinds of errors at
various stages:
 Lexical : name of some identifier typed incorrectly
 Syntactical : missing semicolon or unbalanced
parenthesis
 Semantical : incompatible value assignment
 Logical : code not reachable, infinite loop
Symbol Table Management.
 A compiler records the identifiers used in the source
program and collect information about various attributes
of each identifier.
 These attributes may provide information about:
 The storage allocated.
 Its type.
 Its scope (Where in the program it is valid).
 In case of procedure:
 Name.
 The number an types of its argument.
 The method of passing arguments (by value or by
reference).
 The type returned.
Symbol Table Management.
 A symbol table is a data structure containing a record for
each identifier with fields for the attributes of the
identifier.
 The data structure allows us to find the record for each
identifier quickly and to store or retrieve data from that
record quickly.
 Lexical analyzer enters the identifiers detected in the
source program into symbol table but cannot determine
the other relevant attributes of the identifier.
 The other phases enter information about identifiers into
the symbol table and then uses these information in
various ways.
Symbol Table Cont..
For variables typical attributes includes:
 Its type.
 How much memory it occupies.
 Its Scope.
For Procedures and Functions typical attributes
are:
 The type of value return(if any),
 The number and type of each argument(if any)
and
 The method of passing each argument.
Symbol Table Cont..
Purpose of Symbol Table:
 To provide quick and uniform access to identifier

attributes throughout the compilation process.

 Information is usually put into symbol table


during the first two phases of the compiler i.e
Lexical and Syntax Analysis
Translation of a statement.
Front End and Back End.
 The phases are collected into a front end and a back
end.
 Similar to the division into analysis and synthesis
parts.
 The front end contains of those phases that depends
primarily on the source language and not on the target
machine language.
 Contains Lexical analysis, Syntax analysis, Creation
of Symbol table, Semantic analysis and the
generation of intermediate code.
 Front end also include the error handling that goes
along with each of these phases.
Front End and Back End.
 The back end includes those phases of the
compiler that depends on the target machine
language.
 Does not depend on the source language just
like the intermediate language.
 Code generation is part of the back end.
Front End and Back End.

source IR machine
code Front Back code
End End

errors
The End.

You might also like