CompilerConstruction ClassNotesRSJ
CompilerConstruction ClassNotesRSJ
1 Introduction 1
2 Lexical Analysis 6
A.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
A.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
B Do It Yourselves Examples 11
These notes are written only to augment slides, lectures, labs and the course page. These must be read
with the cited reference materials and slides and other materials shared on the course page, all the materials
produced and used in the lab, etc. They cannot replace so many sources even for a small-credit exam of
short duration. At the best, they will serve as demystifying and mnemonic references to the more elaborate
materials.
Above all, nothing can replace Doing-It-Yourselves.
Do it Yourself
DiY1: Throughout, there will be such "Do It Yourselves" suggestions interspersed. They
are like exercises but have no particular tangible aim. They are observation experiments to
illuminate several neglected but important aspects of language processors that are otherwise
not highlighted in texts.
The course is a hands-on course, and the engagements in lectures and labs are in equal measure. Therefore,
we use the expository DiY experiments to illustrate theory followed by exercises. Throughout, there will be
exercises also interspersed. These are concrete tasks to be carried out by developing programs with specific
toolchains.
Exercise
Exercise1: Exercises are similar to DiY suggestions, with a different color scheme.
Chapter 1
Introduction
This course is about building programming language processors. Though the skills needed (and learnt)
are useful in many areas of applications from systems programming to natural language processing and
computational linguistics, here and now we focus on translation of useful valid source code into effective
executable code. This may be done in an interactive shell (interpreter) mode or shippable bundle (compiler)
mode. Again, we focus on the latter.
Our textbook ([T1, Chapter 1]) says:
Programming languages are notations for describing computations to people and to machines. The
world as we know it depends on programming languages, because all the software running on all the
computers was written in some programming language. But, before a program can be run, it first
must be translated into a form in which it can be executed by a computer.
The software systems that do this translation are called compilers.
This book is about how to design and implement compilers. We shall discover that a few basic
ideas can be used to construct translators for a wide variety of languages and machines. Besides
compilers, the principles and techniques for compiler design are applicable to so many other
domains that they are likely to be reused many times in the career of a computer scientist. The
study of compiler writing touches upon programming languages, machine architecture, language
theory, algorithms, and software engineering.
DiY2: Use the intermediate output options and the verbose option of gcc to get a complete
cross-section view of the whole compilation process from C source code to a binary (ELF)
executable.
In order to see more than what gcc reports on its own in the verbose mode, and to interpret the terse
technical messages, we can directly try to observe the correspondences between the input and output at each
stage.
We take a simple program that has a recursive function (computing gcd by the basic Euclidean method
by repeated subtraction) and a main function that calls it. We also keep a global variable, formal parameters
to the recursive function, local variables in the main function–all to demonstrate all relevant components of
a program and then the differences in handling them in compilation.
In the remaining part of this section, we give the program. You will find the corresponding outputs in
Appendix B.1. They are all the stage outputs of the important compilation stages. An exercise to answer
based on these outputs follows the program listing here.
Syntax Directed Translation
Program 1
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− e u c l i d . c−−−−−−−−−−−−−−−−−−−−−−−−−−−−
#i n c l u d e <s t d i o . h>
#i n c l u d e <s t d l i b . h>
i n t globalCount = 0 ;
i n t gcd ( i n t a , i n t b ) {
g l o b a l C o u n t++;
i f ( a>0 && b>0) {
i f ( a>b ) r e t u r n gcd ( b , a−b ) ;
e l s e i f ( b>a ) r e t u r n gcd ( a , b−a ) ;
e l s e return a ;
}
e l s e return a ;
}
i n t main ( i n t argc , c h a r ∗ argv [ ] ) {
i n t a = 10 , b = 15;
i f ( argc >1) {
a = a t o i ( argv [ 1 ] ) ;
i f ( argc >2) b = a t o i ( argv [ 2 ] ) ;
}
p r i n t f ( ” gcd ( a = %d , b = %d ) = %d ; ” , a , b , gcd ( a , b ) ) ;
p r i n t f ( ” o b t a i n e d i n %d i t e r a t i o n s . \ n ” , g l o b a l C o u n t ) ;
return 0;
}
//−−−−−−−−−−−−−−−−End o f e u c l i d . c−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Exercise
Exercise2: Using the various gcc options for stopping at various stages in the process from
source to executable on various programs (including the example one given below), determine
the following:
1. What is the basic indivisible unit in which all executable code resides, in all stages from
C source to binary executable?
2. Which identifiers are retained intact through which stages? Why?
3. What happens to all the information associated with identifiers when they disappear after a
stage in the compilation process?
4. How are composite expressions translated? (hint: To answer this, try expressions involving
a mix of identifiers and constants, and see the different translations.)
5. How are function calls prepared and placed?
Try to not just describe, but exactly specify these points in detail covering, as much as you
can, all possibilities. You may use (and modify and use repeatedly to experiment) the program
in 2.
E→E + T |T (1.1)
T →T ∗ F |F (1.2)
F → ( E ) | num | [ L ] (1.3)
L→E X (1.4)
X→L|ϵ (1.5)
This grammar extends the expressions to those on vectors thus:
producing a list of numbers that can fit wherever factors (scalars and parenthesized expressions) fit. Note
that in the textbook grammar (4.1) there is the id token (identifiers) that is replaced by num in our extended
grammar. This change is made because we want to demonstrate the SDT scheme not by code-to-code
translation, but by carrying out expresion evaluation as the result of the translation process. Thus, we
“translate” expressions into their values.
The Control Flow The previous diagram depicted the data flow of expression evaluation in a sense. The
execution process is obviously sequential, operating on one node of the expression tree in one step. The
control flow of this process is akin to the following diagram from the textbook([T1, Figure 2.3, page 41]):
except that the “three address code” in the output at the right end is replaced by the value of the expression.
We can call this scheme Syntex Directed Evaluation. Moreover, we omit intermediate code generation
(instead doing subtree–or sub-expression-tree–evaluation) and symbol table construction. We don’t have
identifiers of any kind, the first class citizens (i.e. factors generated by F in the grammar) of our small world
are only numbers and vectors. Thus there are no symbols to be tabulated.
In this design, each nonterminal in the body of a production has a positional “dollar” variable associated with
it. It will be assigned a yylval initialised by the lexer for NUM tokens or assigned a subtree value evaluated
and backed up by the parser. For each of the LIST productions, a new node has to be created to hold the
aggregate structure that becomes the evaluation of its RHS body. Other than assignment and new node
creations, the real arithmetic evaluation operations happen in the two functions Add, Multiply. These are
given below.
void AddAtomToList ( double num , l i s t ∗ l ) {
if ( l ) {
i f ( l −> f i r s t ) {
AddAtomToList (num , l −> f i r s t ) ;
AddAtomToList (num , l −>r e s t ) ;
}
e l s e l −>v a l u e += num ;
}
}
l i s t ∗ M u l t i p l y ( l i s t ∗ one , l i s t ∗two ) {
i f ( ! ( one | | two ) ) return NULL;
i f ( one && ! two ) return one ;
i f ( ! one && two ) return two ;
i f ( one−> f i r s t ) {
i f ( two−> f i r s t ) {
one−> f i r s t = M u l t i p l y ( one−>f i r s t , two−> f i r s t ) ;
one−>r e s t = M u l t i p l y ( one−>r e s t , two−>r e s t ) ;
return one ;
}
else {
MultiplyAtomToList ( two−>val ue , one−> f i r s t ) ;
f r e e ( two ) ;
return one ;
}
}
else {
MultiplyAtomToList ( one−>val ue , two ) ;
f r e e ( one ) ;
return two ;
}
}
Design Notes: In the code listed above, the recursive evaluation of binary operations is intended to go
like this:
1. If both the operands are atoms (scalars), that mean their first members are both empty, their values
are operated upon directly using the corresponding native C operators.
2. If any one operand is a scalar and the other a vector, then the operation is distributed on the vector
keeping the scalar operand common.
3. If both are vectors, then their first and rest components are recursively operated upon respectively.
Some interaction with the calculator generated from this code is given below:
(1+2)*3
===> 9
[1 2]*[3 4]
===> [3 [8 ]]
[1 (2+[3 4])]*(5+6)
===> [11 [[5 [6 ]]]]
[1 2]+3
===> [4 [2 ]]
3+[1 2]
===> [4 [5 ]]
[1 2]*3
===> [3 [2 ]]
3*[1 2]
===> [3 [6 ]]
See the video (https://quantaaws.bits-goa.ac.in/mod/resource/view.php?id=20961) also. We see
that the operations involving a scalar and a list (vector) are not commutative. Can you find the bug in
the code given above?
Page 5 of 45
Chapter 2
Lexical Analysis
The task is to design a game programming language for a programmer who would want to create a game of
“extetrominoes” (a set of four-block figures like tetrominoes, the original idea of having the blocks connected
only on edges extended to blocks connected on corners as well). Thus, the number of possible blocks is larger
than 7. Some examples are shown in the following figure.
1. Processes sections of a source code file (belonging to a programming language not yet specified) in
different ways. The sections are marked by special keywords. Currently the keywords are “Section1”,
“Section2”, “Section3”.
2. These sections should appear exactly once each, and in this sequence only in the source file, but if that
doesn’t happen, your scanner needs to detect and exit with an error message.
3. You will be given a list of delimiters (which may change for each section) which separate words (tokens)
in the source. The delimiters are to be ignored, not to be reproduced in the output. The scanner
recognises tokens between delimiters.
Syntax
4. Token types are: numbers (the usual signed integers), the usual identifiers, five arithmetic operators
+,-,*,/,=and parentheses “()”.
5. Any other characters than 3,4 above are to be treated as an error, but it is to be recovered from and
scanning to be continued.
6. Apart from scanning and giving the usual token stream, the scanner should output a list of identifiers
with their occurrence positions (identifier sequence number, section-wise) for each section. The sequence
of identifiers in this list should be alphanumeric (dictionary).
7. Identifiers that occur afresh in Section2 or Section3 (without occurring in any previous section) should
be caught at a later stage. In fact, they should be treated as strings (without quotes as delimiters).
• In Section 2, square brackets [] and curly braces {} can be used to surround separable units. In
syntax and semantics later, we will introduce scope rules for identifiers within such units. In the
other two sections, these two types of brackets are neither delimiters nor to be ignored silently:
they should provoke lexical error messages (and then ignored).
• Other than alphanumeric characters, underscore, the decimal point, delimiters and the 7 “opera-
tors” +-*/=() all other characters are to be caught as errors.
• We are extending the class of numbers to signed reals (floating point) too. Whether to use the
lookahead operator to distinguish between integers and reals with one rule each for them or to
relegate the distinction to the action part for a single rule for both – that’s your choice.
• Whitespace characters are delimiters in all sections.
• How to allow these sectional differences is your choice. You may use global flags, switches in the
action parts, changes in the main driver for each section detection, etc.
Remember that the extetrominoes are to be represented in a program by special long integers: thus, the
two extetrominoes of which the matrix representation is shown in the diagram above will be represented
as 4201101010 and 4201101001 respectively. The first digit in each is the number of rows of the matrix
representation, the second is the columns, followed by the rows of bits.
A.2 Syntax
The game program is to be contained mainly in Section2. Section2 consists entirely of function bodies. A
Play function is a must, and that will be treated as the “main” function of the game engine generated.
A.3 Semantics
We want the compiler to generate Python code. If you want something else, you need to be clear about the
design specifications and the rationale, but in the implementation we will help you.
Page 10 of 45
Appendix B
Do It Yourselves Examples
gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=
/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
COLLECT_GCC_OPTIONS=’-v’ ’-fverbose-asm’ ’-save-temps’
’-mtune=generic’ ’-march=x86-64’ ’-dumpdir’ ’a-’
/usr/lib/gcc/x86_64-linux-gnu/11/cc1 -E -quiet -v
-imultiarch x86_64-linux-gnu euclid.c -mtune=generic -march=x86-64
-fverbose-asm -fpch-preprocess -fasynchronous-unwind-tables
-fstack-protector-strong -Wformat -Wformat-security
-fstack-clash-protection -fcf-protection -o a-euclid.i
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/11/include
/usr/local/include
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
COLLECT_GCC_OPTIONS=’-v’ ’-fverbose-asm’ ’-save-temps’ ’-mtune=generic’
’-march=x86-64’ ’-dumpdir’ ’a-’
/usr/lib/gcc/x86_64-linux-gnu/11/cc1 -fpreprocessed a-euclid.i -quiet
-dumpdir a- -dumpbase euclid.c -dumpbase-ext .c -mtune=generic
-march=x86-64 -version -fverbose-asm -fasynchronous-unwind-tables
-fstack-protector-strong -Wformat -Wformat-security
-fstack-clash-protection -fcf-protection -o a-euclid.s
GNU C17 (Ubuntu 11.4.0-1ubuntu1~22.04) version 11.4.0 (x86_64-linux-gnu)
compiled by GNU C version 11.4.0, GMP version 6.2.1,
MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP
/usr/lib/gcc/x86_64-linux-gnu/11/crtbeginS.o
-L/usr/lib/gcc/x86_64-linux-gnu/11
-L/usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu
-L/usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib
-L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu
-L/usr/lib/../lib
-L/usr/lib/gcc/x86_64-linux-gnu/11/../../.. a-euclid.o
-lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc
--push-state --as-needed -lgcc_s
--pop-state /usr/lib/gcc/x86_64-linux-gnu/11/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS=’-v’ ’-fverbose-asm’ ’-save-temps’
’-mtune=generic’ ’-march=x86-64’ ’-dumpdir’ ’a.’
# 0 ” euclid . c”
# 0 ”<b u i l t −in>”
# 0 ”<command−l i n e >”
# 1 ” / u s r / i n c l u d e / s t d c −p r e d e f . h” 1 3 4
# 0 ”<command−l i n e >” 2
# 1 ” euclid . c”
# 1 ” / u s r / i n c l u d e / s t d i o . h” 1 3 4
# 27 ” / u s r / i n c l u d e / s t d i o . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / l i b c −header−s t a r t . h” 1 3 4
# 33 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / l i b c −header−s t a r t . h” 3 4
# 1 ” / u s r / i n c l u d e / f e a t u r e s . h” 1 3 4
# 392 ” / u s r / i n c l u d e / f e a t u r e s . h” 3 4
# 1 ” / u s r / i n c l u d e / f e a t u r e s −time64 . h” 1 3 4
# 20 ” / u s r / i n c l u d e / f e a t u r e s −time64 . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / w o r d s i z e . h” 1 3 4
# 21 ” / u s r / i n c l u d e / f e a t u r e s −time64 . h” 2 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t i m e s i z e . h” 1 3 4
# 19 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t i m e s i z e . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / w o r d s i z e . h” 1 3 4
# 20 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t i m e s i z e . h” 2 3 4
# 22 ” / u s r / i n c l u d e / f e a t u r e s −time64 . h” 2 3 4
# 393 ” / u s r / i n c l u d e / f e a t u r e s . h” 2 3 4
# 486 ” / u s r / i n c l u d e / f e a t u r e s . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / c d e f s . h” 1 3 4
# 559 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / c d e f s . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / w o r d s i z e . h” 1 3 4
# 560 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / c d e f s . h” 2 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / long−d o u b l e . h” 1 3 4
# 561 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / c d e f s . h” 2 3 4
# 487 ” / u s r / i n c l u d e / f e a t u r e s . h” 2 3 4
# 510 ” / u s r / i n c l u d e / f e a t u r e s . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/gnu/ s t u b s . h” 1 3 4
# 10 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/gnu/ s t u b s . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/gnu/ s t u b s −64.h” 1 3 4
# 11 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/gnu/ s t u b s . h” 2 3 4
# 511 ” / u s r / i n c l u d e / f e a t u r e s . h” 2 3 4
# 34 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / l i b c −header−s t a r t . h” 2 3 4
# 28 ” / u s r / i n c l u d e / s t d i o . h” 2 3 4
typedef struct G f p o s t
{
off t pos ;
mbstate t state ;
} fpos t ;
# 40 ” / u s r / i n c l u d e / s t d i o . h” 2 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t y p e s / f p o s 6 4 t . h” 1 3 4
# 10 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t y p e s / f p o s 6 4 t . h” 3 4
typedef struct G f p o s 6 4 t
{
off64 t pos ;
mbstate t state ;
} fpos64 t ;
# 41 ” / u s r / i n c l u d e / s t d i o . h” 2 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t y p e s / FILE . h” 1 3 4
struct IO FILE ;
typedef struct IO FILE FILE ;
# 42 ” / u s r / i n c l u d e / s t d i o . h” 2 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t y p e s /FILE . h” 1 3 4
struct IO FILE ;
struct IO FILE
{
int f l a g s ;
Page 16 of 45
Division of Labour in gcc’s Working
char ∗ I O s a v e b a s e ;
char ∗ I O b a c k u p b a s e ;
char ∗ I O s a v e e n d ;
struct IO marker ∗ m a r k e r s ;
struct IO FILE ∗ c h a i n ;
int f i l e n o ;
int f l a g s 2 ;
off t old offset ;
unsigned short c u r c o l u m n ;
signed char v t a b l e o f f s e t ;
char s h o r t b u f [ 1 ] ;
IO lock t ∗ lock ;
off64 t offset ;
struct I O c o d e c v t ∗ c o d e c v t ;
struct I O w i d e d a t a ∗ w i d e d a t a ;
struct IO FILE ∗ f r e e r e s l i s t ;
void ∗ f r e e r e s b u f ;
size t pad5 ;
int mode ;
Page 19 of 45
Division of Labour in gcc’s Working
;
extern int s c a n f ( const char ∗ restrict format , . . . ) asm ( ”” ”
isoc99 scanf ”)
;
extern int s s c a n f ( const char ∗ r e s t r i c t s , const char ∗ r e s t r i c t format
, ...) asm ( ”” ” i s o c 9 9 s s c a n f ” ) attribute (( nothrow ,
leaf ))
;
# 459 ” / u s r / i n c l u d e / s t d i o . h” 3 4
extern int v f s c a n f ( FILE ∗ r e s t r i c t s , const char ∗ restrict format ,
gnuc va list arg )
attribute (( format ( scanf , 2 , 0) ) ) ;
Page 20 of 45
Division of Labour in gcc’s Working
Page 21 of 45
Division of Labour in gcc’s Working
Page 22 of 45
Division of Labour in gcc’s Working
extern o f f t f t e l l o ( FILE ∗ s t r e a m ) ;
# 760 ” / u s r / i n c l u d e / s t d i o . h” 3 4
extern int f g e t p o s ( FILE ∗ r e s t r i c t stream , f p o s t ∗ restrict pos ) ;
Page 23 of 45
Division of Labour in gcc’s Working
# 2 ” euclid . c” 2
# 1 ” / u s r / i n c l u d e / s t d l i b . h” 1 3 4
# 26 ” / u s r / i n c l u d e / s t d l i b . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / l i b c −header−s t a r t . h” 1 3 4
# 27 ” / u s r / i n c l u d e / s t d l i b . h” 2 3 4
typedef struct
{
long int quot ;
long int rem ;
} ldiv t ;
extension
extern long long int s t r t o q ( const char ∗ r e s t r i c t nptr ,
char ∗∗ r e s t r i c t e n d p t r , int base )
attribute (( nothrow , leaf )) attribute (( nonnull (1) )
);
extension
extern unsigned long long int s t r t o u q ( const char ∗ r e s t r i c t nptr ,
char ∗∗ r e s t r i c t e n d p t r , int base )
attribute (( nothrow , leaf )) attribute (( nonnull (1) )
);
Page 25 of 45
Division of Labour in gcc’s Working
extension
extern long long int s t r t o l l ( const char ∗ r e s t r i c t nptr ,
char ∗∗ r e s t r i c t e n d p t r , int base )
attribute (( nothrow , leaf )) attribute (( nonnull (1) )
);
extension
extern unsigned long long int s t r t o u l l ( const char ∗ r e s t r i c t nptr ,
char ∗∗ r e s t r i c t e n d p t r , int base )
attribute (( nothrow , leaf )) attribute (( nonnull (1) )
);
# 386 ” / u s r / i n c l u d e / s t d l i b . h” 3 4
extern char ∗ l 6 4 a ( long int n) attribute (( nothrow , leaf )) ;
typedef m o d e t mode t ;
Page 26 of 45
Division of Labour in gcc’s Working
typedef id t id t ;
# 114 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / t y p e s . h” 3 4
typedef daddr t daddr t ;
typedef caddr t caddr t ;
}
# 69 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / byteswap . h” 3 4
extension static inline uint64 t
bswap 64 ( u i n t 6 4 t bsx )
{
}
# 36 ” / u s r / i n c l u d e / e n d i a n . h” 2 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / uintn −i d e n t i t y . h” 1 3 4
# 32 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / uintn −i d e n t i t y . h” 3 4
static inline uint16 t
uint16 identity ( uint16 t x)
{
return x;
}
typedef struct
{
unsigned long int v a l [ ( 1 0 2 4 / ( 8 ∗ s i z e o f ( unsigned long int ) ) ) ] ;
} sigset t ;
# 5 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t y p e s / s i g s e t t . h” 2 3 4
struct t i m e v a l
{
time t tv sec ;
suseconds t tv usec ;
};
# 38 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / s e l e c t . h” 2 3 4
time t tv sec ;
Page 29 of 45
Division of Labour in gcc’s Working
s y s c a l l s l o n g t tv nsec ;
# 31 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / t y p e s / s t r u c t t i m e s p e c . h” 3 4
};
# 40 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / s e l e c t . h” 2 3 4
} fd set ;
typedef f d m a s k fd mask ;
# 91 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / s e l e c t . h” 3 4
Page 30 of 45
Division of Labour in gcc’s Working
int kind ;
short spins ;
short elision ;
pthread list t list ;
# 53 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / s t r u c t m u t e x . h” 3 4
};
# 77 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / thread −shared −t y p e s . h” 2 3 4
# 89 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / thread −shared −t y p e s . h” 3 4
# 1 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / s t r u c t r w l o c k . h” 1 3 4
# 23 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / s t r u c t r w l o c k . h” 3 4
struct pthread rwlock arch t
{
Page 31 of 45
Division of Labour in gcc’s Working
typedef struct
{
int data ;
} once flag ;
# 24 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ b i t s / p t h r e a d t y p e s . h” 2 3 4
typedef union
{
char size [4];
int align ;
} pthread mutexattr t ;
typedef union
{
char size [4];
int align ;
} pthread condattr t ;
Page 32 of 45
Division of Labour in gcc’s Working
typedef int p t h r e a d o n c e t ;
union p t h r e a d a t t r t
{
char size [56];
long int align ;
};
typedef union p t h r e a d a t t r t p t h r e a d a t t r t ;
typedef union
{
struct pthread mutex s data ;
char size [40];
long int align ;
} pthread mutex t ;
typedef union
{
struct pthread cond s data ;
char size [48];
extension long long int align ;
} pthread cond t ;
typedef union
{
struct pthread rwlock arch t data ;
char size [56];
long int align ;
} pthread rwlock t ;
typedef union
{
char size [8];
long int align ;
} pthread rwlockattr t ;
typedef v o l a t i l e int p t h r e a d s p i n l o c k t ;
typedef union
{
char size [32];
long int align ;
} pthread barrier t ;
typedef union
Page 33 of 45
Division of Labour in gcc’s Working
{
char size [4];
int align ;
} pthread barrierattr t ;
# 228 ” / u s r / i n c l u d e / x86 64−l i n u x −gnu/ s y s / t y p e s . h” 2 3 4
# 396 ” / u s r / i n c l u d e / s t d l i b . h” 2 3 4
struct d r a n d 4 8 d a t a
{
unsigned short int x [3];
unsigned short int old x [ 3 ] ;
unsigned short int c;
unsigned short int init ;
extension unsigned long long int a;
};
Page 35 of 45
Division of Labour in gcc’s Working
attribute (( malloc ( b u i l t i n f r e e , 1) ) ) ;
# 1 ” / u s r / i n c l u d e / a l l o c a . h” 1 3 4
# 24 ” / u s r / i n c l u d e / a l l o c a . h” 3 4
# 1 ” / u s r / l i b / g c c / x86 64−l i n u x −gnu /11/ i n c l u d e / s t d d e f . h” 1 3 4
# 25 ” / u s r / i n c l u d e / a l l o c a . h” 2 3 4
# 575 ” / u s r / i n c l u d e / s t d l i b . h” 2 3 4
Page 40 of 45
Division of Labour in gcc’s Working
# 3 ” euclid . c” 2
# 3 ” euclid . c”
int g l o b a l C o u n t = 0 ;
int gcd ( int a , int b ) {
g l o b a l C o u n t++;
i f ( a>0 && b>0) {
i f ( a>b ) return gcd ( b , a−b ) ;
e l s e i f ( b>a ) return gcd ( a , b−a ) ;
e l s e return a ;
}
e l s e return a ;
}
int main ( int argc , char ∗ argv [ ] ) {
int a = 1 0 , b = 1 5 ;
i f ( argc >1) {
a = a t o i ( argv [ 1 ] ) ;
i f ( argc >2) b = a t o i ( argv [ 2 ] ) ;
}
p r i n t f ( ” gcd ( a = %d , b = %d ) = %d ; ” , a , b , gcd ( a , b ) ) ;
p r i n t f ( ” o b t a i n e d i n %d i t e r a t i o n s . \ n” , g l o b a l C o u n t ) ;
return 0 ;
}
The -fverbose-asm option puts extra commentary information in the generated assembly code to make it
more readable. This option is generally only of use to those who actually need to read the generated assembly
code (perhaps while debugging the compiler itself). The added comments include:
* the source code lines associated with the assembly instructions, in the form
FILENAME:LINENUMBER:CONTENT OF LINE,
* hints on which high-level expressions correspond to the various assembly instruction operands.
.file ”euclid.c”
# GNU C17 ( Ubuntu 11 . 4 . 0 −1ubuntu1 ˜22 . 0 4 ) v e r s i o n 11 . 4 . 0 ( x86 64−l i n u x −gnu )
# c o m p i l e d by GNU C v e r s i o n 11 . 4 . 0 , GMP v e r s i o n 6 . 2 . 1 , MPFR v e r s i o n 4
. 1 . 0 , MPC v e r s i o n 1 . 2 . 1 , i s l v e r s i o n i s l −0. 2 4 −GMP
. g l o b l globalCount
.bss
.align 4
.type globalCount , @object
.size globalCount , 4
globalCount :
.zero 4
.text
. g l o b l gcd
.type gcd , @ f u n c t i o n
gcd :
.LFB6 :
.cfi startproc
endbr64
pushq %rbp #
. c f i d e f c f a o f f s e t 16
. c f i o f f s e t 6 , −16
movq %rsp , %rbp #,
.cfi def cfa register 6
subq $16 , %r s p #,
movl %edi , −4(%rbp ) # a , a
movl %e s i , −8(%rbp ) # b , b
# euclid.c :5: g l o b a l C o u n t++;
movl g l o b a l C o u n t(% r i p ) , %eax # globalCount , g l o b a l C o u n t . 0 1
addl $1 , %eax #, 2
movl %eax , g l o b a l C o u n t(% r i p ) # 2 , g l o b a l C o u n t
# euclid.c :6: i f ( a>0 && b>0) {
cmpl $0 , −4(%rbp ) #, a
jle .L 2 #,
# euclid.c :6: i f ( a>0 && b>0) {
cmpl $0 , −8(%rbp ) #, b
jle .L 2 #,
# euclid.c :7: i f ( a>b ) r e t u r n gcd ( b , a−b ) ;
movl −4(%rbp ) , %eax # a , tmp88
cmpl −8(%rbp ) , %eax # b , tmp88
jle .L 3 #,
# euclid.c :7: i f ( a>b ) r e t u r n gcd ( b , a−b ) ;
movl −4(%rbp ) , %eax # a , tmp89
subl −8(%rbp ) , %eax # b , tmp89
movl %eax , %edx # tmp89 , 3
movl −8(%rbp ) , %eax # b , tmp90
movl %edx , %e s i # 3,
movl %eax , %edi # tmp90 ,
call gcd #
jmp .L 4 #
.L 3 :
# euclid.c :8: e l s e i f ( b>a ) r e t u r n gcd ( a , b−a ) ;
movl −8(%rbp ) , %eax # b , tmp91
cmpl −4(%rbp ) , %eax # a , tmp91
jle .L 5 #,
# euclid.c :8: e l s e i f ( b>a ) r e t u r n gcd ( a , b−a ) ;
movl −8(%rbp ) , %eax # b , tmp92
subl −4(%rbp ) , %eax # a , tmp92
movl %eax , %edx # tmp92 , 4
movl −4(%rbp ) , %eax # a , tmp93
movl %edx , %e s i # 4,
movl %eax , %edi # tmp93 ,
call gcd #
jmp .L 4 #
.L 5 :
# euclid.c :9: else return a ;
movl −4(%rbp ) , %eax # a , 5
jmp .L 4 #
.L 2 :
# e u c l i d . c : 1 1 : else return a ;
movl −4(%rbp ) , %eax # a , 5
.L 4 :
# euclid.c :12: }
leave
Page 42 of 45
Division of Labour in gcc’s Working
Page 44 of 45
Bibliography
[T1] Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Tech-
niques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., USA.
[R1] Niklaus Wirth. 1996. Compiler construction. Addison Wesley Longman Publishing Co., Inc., USA.
[R2] Dick Grune, Kees van Reeuwijk, Henri E. Bal, Ceriel J.H. Jacobs, and Koen Langendoen. 2012. Modern
Compiler Design (2nd. ed.). Springer Publishing Company, Incorporated.
[M1] Thomas Niemann. A Guide to Lex & Yacc.
E-book, available at https://arcb.csc.ncsu.edu/m̃ueller/codeopt/codeopt00/y man.pdf
[M2] Info pages or pdf manuals from the SourceForge or other GNU-FSF sources for gcc, lex (flex),
yacc (bison), binutils, findutils, Common Lisp or Python.