Compiler Design Notes
Compiler Design Notes
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)
In order to reduce the complexity of designing and building computers, nearly all of these
are made to execute relatively simple commands (but do so very quickly). A program for a
computer must be built by combining these very simple commands into a program in what is called
machine language. Since this is a tedious and error-prone process most programming is, instead,
done using a high-level programming language. This language can be very different from the
machine language that the computer can execute, so some means of bridging the gap is required.
This is where the compiler comes in.
Programming
Language
(Source)
1
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)
2. Interpreter program:
Figure 4: An interpreter is a program that performs the operations implied by the source
program.
2
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)
An interpreter may need to process the same piece of the syntax tree (for example, the body
of a loop) many times and, hence; interpretation is typically slower than executing a compiled
program. But writing an interpreter is often simpler than writing a compiler and the interpreter is
easier to move to a different machine, so for applications where speed is not of essence,
interpreters are often used. Languages such as BASIC, SNOBOL, and LISP can be translated using
interpreters. JAVA also uses interpreter. The process of interpretation can be carried out in
following phases.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Direct Execution
Advantages:
- Modification of user program can be easily made and implemented as execution proceeds.
- Type of object that denotes various may change dynamically.
- Debugging a program and finding errors is simplified task for a program used for
interpretation.
- The interpreter for the language makes it machine independent.
Disadvantages:
3. Hybrid work:
Example: Java language processors combine compilation and interpretation, as shown in Fig 5. A
Java source program may first be compiled into an intermediate form called bytecodes. The
bytecodes are then interpreted by a virtual machine. A benefit of this arrangement is that bytecodes
compiled on one machine can be interpreted on another machine, perhaps across a network.
In order to achieve faster processing of inputs to outputs, some Java compilers, called just-
in-time compilers, translate the bytecodes into machine language immediately before they run the
intermediate program to process the input.
3
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)
Analysis
(Front-End)
Error Handler
Synthesis
(Back-End)
It is the first phase of the compiler of reading and analysing the program text. It gets input
from the source program and produces or divided into tokens as output. It reads the characters one
by one, starting from left to right and forms the tokens. Token represents a logically cohesive
sequence of characters such as keywords, operators, variable name or identifiers, special
symbols, number etc. Example: a + b = 20, Here a, b, +, =, 20 are all separate tokens. Group of
characters forming a token is called the Lexeme. The lexical analyser not only generates a token
but also enters the lexeme into the symbol table if it is not already there.
It is the second phase of the compiler. This phase takes the list of tokens produced by the
lexical analysis and arranges these in a tree-structure (called the syntax tree) as output that reflects
the structure of the program. Syntax tree is a tree in which interior nodes are operators and exterior
nodes are operands. Example: a=b+c*2, syntax tree is:
4
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)
It is the third phase of the compiler. This phase analyses the syntax tree got from parser and
determine if the program violates certain consistency requirements (checks whether the given
syntax is correct or not), e.g., if a variable is used but not declared or if it is used in a context that
does not make sense given the type of the variable, such as trying to use a Boolean value as a
function pointer. Also, it performs type conversion of all the data types into real data types.
It is the fourth phase of the compiler. The program is translated to a simple machine
independent intermediate language. It gets input from the semantic analysis and converts the input
into output as intermediate code such as three address code. The three-address code consists of a
sequence of instructions, each of which has at most three operands as example of t1 = t2 + t3.
It is the fifth phase of the compiler. It gets the intermediate code as input and produces
optimized intermediate code as output. This phase reduces the redundant code and attempts to
improve the intermediate code so that faster-running machine code will result. The symbolic
variable names used in the intermediate code are translated to numbers, each of which corresponds
to a register in the target machine code. During the code optimization, the result of the program is
not affected. To improve the code generation, the optimization involves:
- Loop unrolling.
It is the final phase of the compiler. The intermediate language is translated to assembly
language (a textual representation of machine code) for specific machine architecture. It gets input
from code optimization phase and produces the target code or object code as result. Intermediate
instructions are translated into a sequence of machine instructions that perform the same task. It's
involves allocation of register and memory, generation of correct references, generation of correct
data types, and generation of missing code.
5
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)
A) Analysis (Front end): determines the operations implied by the source program which
are recorded in a tree structure (machine independent / Language dependent).
B) Synthesis (Back end): takes the tree structure and translates the operations therein into
the target program (machine dependent / Language independent).
6. Software development tools: it is available to implement one or more compiler phases. Scanner
generators (LEX, FLEX), Parser generators (YACC, BISON), Syntax-directed translation
engines, Automatic code generators, and Data-flow engines while other tools that use the
Analysis-Synthesis Model:
1. Editors (syntax highlighting)
2. Pretty printers (e.g. Doxygen)
3. Static checkers (e.g. Lint and Splint)
4. Interpreters
5. Text formatters (e.g. TeX and LaTeX)
6. Silicon compilers (e.g. VHDL)
7. Query interpreters/compilers (Databases)
6
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)
Examples of errors:
- In lexical analysis, errors occur in separation of tokens.
- In syntax analysis, errors occur during construction of syntax tree.
- In semantic analysis, errors occur when the compiler detects constructs with right
syntactic structure but no meaning and during type conversion.
- In code optimization, errors occur when the result is affected by the optimization.
- In code generation, it shows error when code is missing etc.
7
Lecture (1): Introduction to Compilers Design / 3rd Class / Computer science Department / Education
College for Women / University of Kufa 2021-2022. Lecturer: (Dr. Hasan Thabit Rashid)
;
Parser |
=
(performs syntax analysis Parse tree or abstract syntax tree / \
based on the grammar of the A +
programming language) / \
B C
int2fp B t1
Three-address code, quads, or
Intermediate code generator + t1 C t2
RTL := t2 A
MOVF #2.3,r1
Code generator Assembly code ADDF2 r1,r2
MOVF r2,A
ADDF2 #2.3,r2
Peephole optimizer Assembly code MOVF r2,A
Figure 8: Output description of compiler phases.
Exercises: