0% found this document useful (0 votes)
172 views

Compilers Crash Course

1) Compilers are translators that convert source code into machine-readable object code. They optimize source code for human readability and machine code for hardware efficiency. 2) Compilation involves multiple steps and representations. Source code is analyzed and converted to intermediate representations before assembly code generation. 3) Correctness is important for compilers to avoid bugs, and compilation aims to balance expressiveness, performance, and efficiency. Compilers use various optimizations and intermediate representations to effectively translate programs.

Uploaded by

Javier Sauler
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
172 views

Compilers Crash Course

1) Compilers are translators that convert source code into machine-readable object code. They optimize source code for human readability and machine code for hardware efficiency. 2) Compilation involves multiple steps and representations. Source code is analyzed and converted to intermediate representations before assembly code generation. 3) Correctness is important for compilers to avoid bugs, and compilation aims to balance expressiveness, performance, and efficiency. Compilers use various optimizations and intermediate representations to effectively translate programs.

Uploaded by

Javier Sauler
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Compilers Crash Course

Prof. Michael Clarkson


CSci 6907.85 Spring 2014

Slides Acknowledgment:
Prof. Andrew Myers (Cornell)

What are Compilers?


• Translators from one representation of
program code to another
• Typically: high-level source code to
machine language (object code)
• Not always:
– Java compiler: Java to interpretable
bytecodes
– Java JIT: bytecode to executable image

2
Source Code
• Source code: optimized for human readability
– expressive: matches human notions of grammar
– redundant to help avoid programming errors
– computation possibly not fully determined by code

int expr(int n)
{
int d;
d = 4 * n * n * (n + 1) * (n + 1);
return d;
}

Machine Code
• Optimized for hardware
– Redundancy, ambiguity reduced
– Information about intent and reasoning lost
– Assembly code ≈ machine code
expr:
pushl %ebp 55
movl %esp, %ebp 89 e5
subl $4, %esp 83 ec 04
movl 8(%ebp), %eax 8b 45 08
movl %eax, %edx 89 c2
imull 8(%ebp), %edx 0f af 55 08
movl 8(%ebp), %eax 8b 45 08
incl %eax 40
imull %eax, %edx 0f af d0
movl 8(%ebp), %eax 8b 45 08
incl %eax 40
imull %edx, %eax 0f af c2
sall $2, %eax c1 e0 02
movl %eax, -4(%ebp) 89 45 fc
movl -4(%ebp), %eax 8b 45 fc
leave c9
ret c3

4
Example (Output assembly code)
Unoptimized Code Optimized Code
expr: expr:
pushl %ebp pushl %ebp
movl %esp, %ebp movl %esp, %ebp
subl $4, %esp movl 8(%ebp), %edx
movl 8(%ebp), %eax movl %edx, %eax
movl %eax, %edx imull %edx, %eax
imull 8(%ebp), %edx incl %edx
movl 8(%ebp), %eax imull %edx, %eax
incl %eax imull %edx, %eax
imull %eax, %edx sall $2, %eax
movl 8(%ebp), %eax leave
incl %eax ret
imull %edx, %eax
sall $2, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
leave
ret

How to translate?
• Source code and machine code mismatch
• Goals:
– source-level expressiveness for task
– best performance for concrete computation
– reasonable translation efficiency (< O(n3))
– maintainable compiler code

6
How to translate correctly?
• Programming languages describe computation
precisely
• erefore: translation can be precisely described (a
compiler can be correct)
• Correctness is very important!
– hard to debug programs with broken compiler…
– non-trivial: programming languages are expressive
– implications for development cost, security
– some compilers have been proven correct!
[X. Leroy, Formal Verification of a Realistic Compiler, CACM 52(7), 2009]

How to translate effectively?


High-level source code

Low-level machine code

8
Idea: translate in steps
• Compiler uses a series of different
intermediate representations (IRs) of
programs.
• Different IRs are good for different phases
of compilation

Compilation in a Nutshell 1
Source code if (b == 0) a = b;
(character stream)
Lexical analysis

Token if ( b == 0 ) a = b ;
stream
Parsing
if
== =
Abstract syntax
tree (AST) b 0 a b

if Semantic Analysis
boolean int
== =
Decorated AST
int b int 0 int a int b
lvalue

10
Compilation in a Nutshell 2
if
boolean int
== =
int b int 0 int a int b Intermediate Code Generation
lvalue

if b == 0 goto L1 else L2
L1: a = b
L2: Assembly Code Generation
cmp rb, 0
jnz L2
L1: mov ra, rb
Register allocation, Optimization
L2:
cmp ecx, 0
cmovz [ebp+8],ecx

11

Compilation in a Nutshell 2
if
boolean int
== =
int b int 0 int a int b Intermediate Code Generation
lvalue

if b == 0 goto L1 else L2
L1: a = b
L2: Assembly Code Generation
cmp rb, 0
jnz L2
L1: mov ra, rb
Register allocation, Optimization
L2:
cmp ecx, 0
cmovz [ebp+8],ecx

12
Simplified Compiler Structure
Source code
(character stream)
if (b == 0) a = b; Lexical analysis
Token stream
Parsing Front end
Abstract syntax tree (machine-independent)
Program Intermediate Code Generation
analysis Intermediate code
& Control flow graphs
Optimization
Assembly Code generation Back end
(machine-dependent)
Assembly code
cmp 0, %rcx
cmovz %rcx, %rdx
13

Even Bigger Picture


Source code
Compiler
Assembly code

Assembler
Object code
(machine code +
symbol tables)
Fully-resolved object Linker
code (machine code +
symbol tables,
relocation info) Loader
Executable image in memory
14
Where to Learn More
– Compilers—Principles, Techniques and
Tools. Aho, Lam, Sethi and Ullman (e Dragon
Book)
(strength: parsing)
– Modern Compiler Implementation in Java.
Andrew Appel.
(strength: translation)
– Advanced Compiler Design and
Implementation. Steve Muchnick.
(strength: optimization)
15

You might also like