Introduction to Compilation
1
10-6
Some Terms
• Source
– The language program was written in
• Object
– The machine language equivalent of the
program after compilation
• Compiler
– A software program that translates the
source code into object code
– Assembler is a special case of compiler
where the source was written in Assembly
language
Compiler
• Read and analyze entire program
• As a Discipline, Involves Multiple CSE Areas
– Programming Languages and Algorithms
– Software Engineering & Theory / Foundations
– Computer Architecture & Operating Systems
• But, Has Surprisingly Simplistic Intent:
4
What’s in a Compiler?
5
Standard Compiler Structure
Source code
(character stream)
Lexical analysis
Token stream
Parsing Front end
(machine-independent)
Abstract syntax tree
Intermediate Code Generation
Intermediate code
Optimization
Intermediate code Back end
(machine-dependent)
Code generation
Assembly code
6
Structure of a Compiler
• First approximation
– Front end: analysis
• Read source program and understand its
structure and meaning
– Back end: synthesis
• Generate equivalent target language program
Source Front End Back End Target
7
Implications
• Must recognize legal programs (& complain about illegal
ones)
• Must generate correct code
• Must manage storage of all variables
• Must agree with OS & linker on target format
• Need some sort of Intermediate Representation (IR)
• Front end maps source into IR
• Back end maps IR to target machine code
Source Front End Back End Target
8
Front End source
Scanner
tokens
Parser
IR
• Split into two parts
– Scanner: Responsible for converting character
stream to token stream
• Also strips out white space, comments
– Parser: Reads token stream; generates IR
• Both of these can be generated automatically
– Source language specified by a formal grammar
– Tools read the grammar and generate scanner &
parser (either table-driven or hard coded)
9
Lex – A Lexical Analyzer Generator
• A Unix Utility from early 1970s
• A Compiler that Takes as Source a Specification for:
– Tokens/Patterns of a Language
– Generates a “C” Lexical Analyzer Program
• Pictorially:
10
Tokens?
11
Tokens
• Token stream: Each significant lexical
chunk of the program is represented by a
token
– Operators & Punctuation: {}[]!+-=*;: …
– Keywords: if while return goto
– Identifiers: id & actual name
– Constants: kind & value; int, floating-point
character, string, …
12
Scanner Example
• Input text
// this statement does very little
if (x >= y) y = 42;
• Token Stream
IF LPAREN ID(x) GEQ ID(y)
RPAREN ID(y) BECOMES INT(42) SCOLON
– Note: tokens are atomic items, not character strings
13
Programming Steps
for Compilation
• Create/Edit source
• Compile source
• Link object modules together
• Test executable
• If errors, Start over
• Stop
Compilation process:
• Invoke compiler on source program to
generate machine language equivalent
• Compiler translates source to object
• Saves object output as disk file[s]
• Large Systems may have many source
programs
• Each has to be compiled
Link object modules together
• Combine them together to form executable
• Take multiple object modules
• LINKER then takes object module(s) and
creates executables for you
– Linker resolves references to other object
modules
– Handles calls to external libraries
– Creates an executable
Linking
• Libraries of subroutines
From Source Code to Executable Code
program
program gcd(input,
gcd(input, output);
output);
var
var i,
i, j:
j: integer;
integer;
begin
begin
read(i,
read(i, j);
j);
while
while ii <>
<> jj do
do
if
if ii >> jj then
then ii :=
:= ii –– j;
j;
Compilation
else
else jj :=
:= jj –– i;
i;
writeln(i)
writeln(i)
end.
end.
Machine code Generation
19
Assemblers
20
Reviewing the Entire Process
21
22
Why Study Compilers? (1)
• Compiler techniques are everywhere
– Parsing (little languages, interpreters)
– Database engines
– AI: domain-specific languages
– Text processing
• Tex/LaTex -> dvi -> Postscript -> pdf
– Hardware: VHDL; model-checking tools
– Mathematics (Mathematica, Matlab)
23
Why Study Compilers? (2)
• Fascinating blend of theory and
engineering
– Direct applications of theory to practice
• Parsing, scanning, static analysis
– Some very difficult problems (NP-hard or
worse)
• Resource allocation, “optimization”, etc.
• Need to come up with good-enough solutions
24
Why Study Compilers? (3)
• Ideas from many parts of CSE
– AI: Greedy algorithms, heuristic search
– Algorithms: graph algorithms, dynamic
programming, approximation algorithms
– Theory: Grammars DFAs and PDAs, pattern
matching, fixed-point algorithms
– Systems: Allocation & naming, synchronization,
locality
– Architecture: pipelines & hierarchy management,
instruction set use
25
Software Language Levels
• Machine Language (Binary)
• Assembly Language
– Assembler converts Assembly into machine
• High Level Languages (C, Perl, Shell)
– Compiled : C
– Interpreted : Perl, Shell
Programming Languages Offer …
• Abstractions
• At different levels
– From low
• Good for machines….
– To high
• Good for humans….
• Three Approaches
– Interpreted
– Compiled
– Mixed
27
Interpretation
• No linking
• No object code generated
• Source statements executed line
by line
Steps in interpretation
• Read a source line
• Parse line
• Do what the line says
– Allocate space for variables
– Execute arithmetic opts etc..
– Go to back to step 1
• Similar to instruction cycle
Interpreter
• Interpreter
– Execution engine
– Program execution interleaved with analysis
running = true;
while (running) {
analyze next statement;
execute that statement;
}
– May involve repeated analysis of some
statements (loops, functions)
30
Interpreters & Compilers
• Interpreter
– A program that reads a source program and
produces the results of executing that
program
• Compiler
– A program that translates a program from one
language (the source) to another (the target)
31
Common Issues
• Compilers and interpreters both must read
the input – a stream of characters – and
“understand” it; analysis
w h i l e ( k < l e n g t h ) { <nl> <tab> i f ( a [ k ] >
0
) <nl> <tab> <tab>{ n P o s + + ; } <nl> <tab> }
32
Compilation Advantages
• Faster Execution
• Single file to execute
• Compiler can do better diagnosis of syntax
and semantic errors, since it has more info
than an interpreter (Interpreter only sees
one line at a time)
• Compiler can optimize code
Compilation Disadvantages
• Harder to debug
• Takes longer to change source code,
recompile and relink
Interpreter Advantages
• Easier to debug
• Faster development time
Interpreter disadvantages
• Slower execution times
• No optimization
• Need all of source code available
• Source code larger than executable for
large systems
Mixed
+ =
37
Hybrid approaches
• Well-known example: Java
– Compile Java source to byte codes – Java Virtual
Machine language (.class files)
– Execution
• Interpret byte codes directly, or
• Compile some or all byte codes to native code
– (particularly for execution hot spots)
– Just-In-Time compiler (JIT)
• Variation: VS.NET
– Compilers generate MSIL
– All IL compiled to native code before execution
38