0% found this document useful (0 votes)
363 views

Compiler Design Notes

The document summarizes an introduction to compilers design lecture. It discusses how compilers translate high-level programming languages into machine language. It defines a compiler as a program that translates source code into machine language. It also describes interpreters, how they differ from compilers in directly executing syntax trees, and examples of languages used with interpreters. Finally, it outlines the common phases of a compiler: lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation.

Uploaded by

iritikdev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
363 views

Compiler Design Notes

The document summarizes an introduction to compilers design lecture. It discusses how compilers translate high-level programming languages into machine language. It defines a compiler as a program that translates source code into machine language. It also describes interpreters, how they differ from compilers in directly executing syntax trees, and examples of languages used with interpreters. Finally, it outlines the common phases of a compiler: lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation.

Uploaded by

iritikdev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education

College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)

1. Introduction to Compilers Design:

In order to reduce the complexity of designing and building computers, nearly all of these
are made to execute relatively simple commands (but do so very quickly). A program for a
computer must be built by combining these very simple commands into a program in what is called
machine language. Since this is a tedious and error-prone process most programming is, instead,
done using a high-level programming language. This language can be very different from the
machine language that the computer can execute, so some means of bridging the gap is required.
This is where the compiler comes in.

Figure 1: Pre-processors, Compilers, Assemblers, and Linkers

Compiler is a program that translates any source code written in a high-level


programming language (which is suitable for human programmers) into the low-level machine
language (that is required by computers). During this process, the compiler will also attempt to
spot and report obvious programmer mistakes.

Programming
Language
(Source)

/ MachineLanguage (.Obj File)

Figure 2: Compiler is a program that translates any source code written in a


source language into a semantically equivalent code written in a target
language.

1
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)

Figure 3: The Compilation & Execution process

Assembler program, used to translate assembly language code into machine


code. Linker program, used to links and merges various object files together in order to make an
executable file. Loader program, is a part of operating system and is responsible for loading
executable files into memory and execute them. The main reasons of using a high-level language
for programming are:
High-level programming language Machine language
1- The notation used is closer to the way humans think 1- The notation used is closer to the way
about problems of computer.
2- Compiler can detecting some obvious programming
2- Hard to know programming mistakes
mistakes
3- Programs tend to be shorter 3- Programs tend to be longer
4- The same program can be compiled to Or run on many
4- One machine language
different machine languages.

2. Interpreter program:

An interpreter is another way of implementing a programming language. An interpreter is a


program that appears to execute a source program as if it were machine language. Interpretation
shares many aspects with compiling. Lexing, parsing and type-checking are in an interpreter done
just as in a compiler. But instead of generating code from the syntax tree, the syntax tree is
processed directly to evaluate expressions and execute statements, and so on.

Figure 4: An interpreter is a program that performs the operations implied by the source
program.

2
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)

An interpreter may need to process the same piece of the syntax tree (for example, the body
of a loop) many times and, hence; interpretation is typically slower than executing a compiled
program. But writing an interpreter is often simpler than writing a compiler and the interpreter is
easier to move to a different machine, so for applications where speed is not of essence,
interpreters are often used. Languages such as BASIC, SNOBOL, and LISP can be translated using
interpreters. JAVA also uses interpreter. The process of interpretation can be carried out in
following phases.

1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Direct Execution
Advantages:

- Modification of user program can be easily made and implemented as execution proceeds.
- Type of object that denotes various may change dynamically.
- Debugging a program and finding errors is simplified task for a program used for
interpretation.
- The interpreter for the language makes it machine independent.

Disadvantages:

- The execution of the program is slower.


- Memory consumption is more.

3. Hybrid work:

Compilation and interpretation may be combined to implement a programming language:


The compiler may produce intermediate-level code which is then interpreted rather than compiled
to machine code, where each choice is a compromise between speed and space:

Example: Java language processors combine compilation and interpretation, as shown in Fig 5. A
Java source program may first be compiled into an intermediate form called bytecodes. The
bytecodes are then interpreted by a virtual machine. A benefit of this arrangement is that bytecodes
compiled on one machine can be interpreted on another machine, perhaps across a network.

In order to achieve faster processing of inputs to outputs, some Java compilers, called just-
in-time compilers, translate the bytecodes into machine language immediately before they run the
intermediate program to process the input.

Figure 5: A Hybrid Work

3
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)

4. The phases of a compiler:


In some compilers, the ordering of phases may differ slightly, some phases may be
combined or split into several phases or some extra phases may be inserted between those
mentioned below. It is common to let each phase be handled by a separate module. Some of these
modules are written by hand, while others may be generated from specifications. Often, some of
the modules can be shared between several compilers.

Analysis
(Front-End)

Error Handler

Synthesis
(Back-End)

Figure 6: The phases of a compiler

1- Lexical analysis phase (Scanning phase):

It is the first phase of the compiler of reading and analysing the program text. It gets input
from the source program and produces or divided into tokens as output. It reads the characters one
by one, starting from left to right and forms the tokens. Token represents a logically cohesive
sequence of characters such as keywords, operators, variable name or identifiers, special
symbols, number etc. Example: a + b = 20, Here a, b, +, =, 20 are all separate tokens. Group of
characters forming a token is called the Lexeme. The lexical analyser not only generates a token
but also enters the lexeme into the symbol table if it is not already there.

2- Syntax analysis phase (Parsing phase):

It is the second phase of the compiler. This phase takes the list of tokens produced by the
lexical analysis and arranges these in a tree-structure (called the syntax tree) as output that reflects
the structure of the program. Syntax tree is a tree in which interior nodes are operators and exterior
nodes are operands. Example: a=b+c*2, syntax tree is:

4
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)

3- Semantic Analysis phase (Type checking phase):

It is the third phase of the compiler. This phase analyses the syntax tree got from parser and
determine if the program violates certain consistency requirements (checks whether the given
syntax is correct or not), e.g., if a variable is used but not declared or if it is used in a context that
does not make sense given the type of the variable, such as trying to use a Boolean value as a
function pointer. Also, it performs type conversion of all the data types into real data types.

4- Intermediate code generation phase:

It is the fourth phase of the compiler. The program is translated to a simple machine
independent intermediate language. It gets input from the semantic analysis and converts the input
into output as intermediate code such as three address code. The three-address code consists of a
sequence of instructions, each of which has at most three operands as example of t1 = t2 + t3.

5- Code Optimization phase (Register allocation):

It is the fifth phase of the compiler. It gets the intermediate code as input and produces
optimized intermediate code as output. This phase reduces the redundant code and attempts to
improve the intermediate code so that faster-running machine code will result. The symbolic
variable names used in the intermediate code are translated to numbers, each of which corresponds
to a register in the target machine code. During the code optimization, the result of the program is
not affected. To improve the code generation, the optimization involves:

- Deduction and removal of dead code (unreachable code).

- Calculation of constants in expressions and terms.

- Collapsing of repeated expression into temporary string.

- Loop unrolling.

- Moving code outside the loop.

- Removal of unwanted temporary variables.

6- Code Generation phase (Machine code generation):

It is the final phase of the compiler. The intermediate language is translated to assembly
language (a textual representation of machine code) for specific machine architecture. It gets input
from code optimization phase and produces the target code or object code as result. Intermediate
instructions are translated into a sequence of machine instructions that perform the same task. It's
involves allocation of register and memory, generation of correct references, generation of correct
data types, and generation of missing code.

5
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)

7- Assembly and linking phase:

The assembly-language code is translated into binary representation and addresses of


variables, functions, etc., are determined. The first three phases are collectively called the frontend
of the compiler and the last three phases are collectively called the backend. The middle part is
only the intermediate code generation, but this often includes various optimisations and
transformations on the intermediate code. Assembly and linking are typically done by programs
supplied by the machine or operating system variable name (identifier), system vendor, and are
hence not part of the compiler itself. Each phase, through checking and transformation, establishes
stronger invariants on the things it passes on to the next, so that writing each subsequent phase is
easier than if these have to take all the preceding into account. For example, the type checker can
assume absence of syntax errors and the code generation can assume absence of type errors.

5. The Analysis-Synthesis Model of Compilation: There are two parts,

A) Analysis (Front end): determines the operations implied by the source program which
are recorded in a tree structure (machine independent / Language dependent).
B) Synthesis (Back end): takes the tree structure and translates the operations therein into
the target program (machine dependent / Language independent).

6. Software development tools: it is available to implement one or more compiler phases. Scanner
generators (LEX, FLEX), Parser generators (YACC, BISON), Syntax-directed translation
engines, Automatic code generators, and Data-flow engines while other tools that use the
Analysis-Synthesis Model:
1. Editors (syntax highlighting)
2. Pretty printers (e.g. Doxygen)
3. Static checkers (e.g. Lint and Splint)
4. Interpreters
5. Text formatters (e.g. TeX and LaTeX)
6. Silicon compilers (e.g. VHDL)
7. Query interpreters/compilers (Databases)

7. Symbol-Table Management and Error Handler:


A) Symbol table: is a data structure containing a record for each variable name (identifier), with
fields for the attributes of the name. The data structure should be designed to allow the
compiler to find the record for each variable name (identifier) quickly and to store or retrieve
data from that record quickly. Symbol table is used to store all the information about
identifiers used in the program. Also, whenever an identifier is detected in any of the phases, it
is stored in the symbol table.
B) Error Handler: It is invoked when a flaw (error) in the source program is detected. One of
the most important functions of a compiler is the detection and reporting of errors in the
source program. The error message should allow the programmer to determine exactly where
the errors have occurred. Errors may occur in all or one phase of a compiler. if the compiler
phase discovers an error, it must report the error to the error handler, which issues an
appropriate diagnostic msg. Both of the table-management and error-handling routines interact
with all phases of the compiler. Each phase can encounter errors. After detecting an error, a
phase must handle the error so that compilation can proceed.

6
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)

Examples of errors:
- In lexical analysis, errors occur in separation of tokens.
- In syntax analysis, errors occur during construction of syntax tree.
- In semantic analysis, errors occur when the compiler detects constructs with right
syntactic structure but no meaning and during type conversion.
- In code optimization, errors occur when the result is affected by the optimization.
- In code generation, it shows error when code is missing etc.

Figure 7: Compiling of an assignment statement.

7
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)

Phase Output Sample


Programmer
Source string A=B+C;
(source code producer)
Scanner ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
Token string
(performs lexical analysis) And symbol table with names

;
Parser |
=
(performs syntax analysis Parse tree or abstract syntax tree / \
based on the grammar of the A +
programming language) / \
B C

Semantic analyzer Annotated parse tree or abstract


(type checking, etc) syntax tree

int2fp B t1
Three-address code, quads, or
Intermediate code generator + t1 C t2
RTL := t2 A

Three-address code, quads, or int2fp B t1


Optimizer + t1 #2.3 A
RTL

MOVF #2.3,r1
Code generator Assembly code ADDF2 r1,r2
MOVF r2,A

ADDF2 #2.3,r2
Peephole optimizer Assembly code MOVF r2,A
Figure 8: Output description of compiler phases.

Exercises:

1- How many phases does analysis consists?


Answer: analysis consists of three phases: Linear analysis, Hierarchical analysis, and
Semantic analysis.
2- What happens in linear analysis?
Answer: This is the phase in which the stream of characters making up the source program
is read from left to right and grouped in to tokens that are sequences of characters having
collective meaning.
3- What happens in hierarchical analysis?
Answer: This is the phase in which characters or tokens are grouped hierarchically in to
nested collections with collective meaning.
4- What happens in semantic analysis?
Answer: This is the phase in which certain checks are performed to ensure that the
components of a program fit together meaningfully.
8

You might also like