C Program Process and Assembly Language
C Program Process and Assembly Language
push %ebp
sub $0x8,%esp
movb $0x41,0xffffffff(%ebp)
Intel:
This is the same as machine
push ebp
language, except the command
mov ebp, esp
Assembly numbers have been replaced by
sub esp, 0C0h
Language letter sequences which are more
readable and easier to HLA (High Level Assembly):
memorize.
program HelloWorld;
#include( "stdlib.hhf" )
begin HelloWorld;
stdout.put( "Hello, World of Assembly
Language", nl );
end HelloWorld;
Instruction
Meaning Example
Category
move from source to
Data Transfer mov, lea, les, push, pop, pushf, popf
destination
add, adc, sub, sbb, mul, imul, div, idiv, cmp, neg,
Arithmetic arithmetic on integers inc, dec, xadd,
cmpxchg
Floating point arithmetic on floating point fadd, fsub, fmul, div, cmp
Logical, Shift, and, or, xor, not, shl/sal, shr,
bitwise logic operations
Rotate and Bit sar, shld and shrd, ror, rol, rcr and rcl
conditional and
Control transfer unconditional jumps, jmp, jcc, call, ret, int, into, bound.
procedure calls
move, compare, input and movs, lods, stos, scas, cmps, outs, rep,
String
output repz, repe, repnz, repne, ins
I/O For input and output in, out
Provide assembly data
Conversion movzx, movsx, cbw, cwd, cwde, cdq, bswap, xlat
types conversion
manipulate individual
flags, provide special
Miscellaneous processor services, or clc, stc, cmc, cld, std, cl, sti
handle privileged mode
operations
The following is C source code portion and the assembly equivalent example using
Linux/Intel.
Bear in mind that if you use the Integrated Development Environment (IDE) type
compilers, these processes quite transparent. Now we are going to examine more
detail about the process that happens before and after the linking stage. For any given
input file, the file name suffix (file extension) determines what kind of compilation is
done and the example for gcc is listed in Table 3.
File
Description
extension
file_name.c C source code which must be preprocessed.
file_name.i C source code which should not be preprocessed.
file_name.ii C++ source code which should not be preprocessed.
file_name.h C header file (not to be compiled or linked).
file_name.cc
file_name.cp
file_name.cxx C++ source code which must be preprocessed. For file_name.cxx, the xx must
file_name.cpp both be literally character x and file_name.C, is capital c.
file_name.c++
file_name.C
file_name.s Assembler code.
file_name.S Assembler code which must be preprocessed.
By default, the object file name for a source file is made by replacing the extension
file_name.o
.c, .i, .s etc with .o
Object File
Description
Format
The a.out format is the original file format for Unix. It consists of three sections: text,
data, and bss, which are for program code, initialized data, and uninitialized data,
a.out respectively. This format is so simple that it doesn't have any reserved place for
debugging information. The only debugging format for a.out is stabs, which is encoded
as a set of normal symbols with distinctive attributes.
The COFF (Common Object File Format) format was introduced with System V
Release 3 (SVR3) Unix. COFF files may have multiple sections, each prefixed by a
COFF
header. The number of sections is limited. The COFF specification includes support
for debugging but the debugging information was limited.
A variant of COFF. ECOFF is an Extended COFF originally introduced for Mips and
ECOFF
Alpha workstations.
The IBM RS/6000 running AIX uses an object file format called XCOFF (eXtended
COFF). The COFF sections, symbols, and line numbers are used, but debugging
XCOFF
symbols are dbx-style stabs whose strings are located in the .debug section (rather than
the string table). The default name for an XCOFF executable file is a.out.
Windows 9x and NT use the PE (Portable Executable) format for their executables. PE
PE
is basically COFF with additional headers.
The ELF (Executable and Linking Format) format came with System V Release 4
(SVR4) Unix. ELF is similar to COFF in being organized into a number of sections,
ELF
but it removes many of COFF's limitations. ELF used on most modern Unix systems,
including GNU/Linux, Solaris and Irix. Also used on many embedded systems.
SOM (System Object Module) and ESOM (Extended SOM) is HP's object file and
SOM/ESOM debug format (not to be confused with IBM's SOM, which is a cross-language
Application Binary Interface - ABI).
When we examine the content of these object files there are areas called
sections. Depend on the settings of the compilation and linking stages, sections can
hold:
1. Executable code.
2. Data.
3. Dynamic linking information.
4. Debugging data.
5. Symbol tables.
6. Relocation information.
7. Comments.
8. String tables, and
9. Notes.
SHARED OBJECTS
In a typical system, a number of programs will be running. Each program relies on a
number of functions, some of which will be standard C library functions, like printf(),
malloc(),strcpy(), etc. If every program uses the standard C library, it means that each
program would normally have a unique copy of this particular library present within it.
Unfortunately, this result in wasted resources, degrade the efficiency and
performance. Since the C library is common, it is better to have each program
reference the common, one instance of that library, instead of having each program
contain a copy of the library. This is implemented during the linking process where
some of the objects are linked during the link time whereas some done during the run
time (deferred/dynamic linking).
STATICALLY LINKED
The term ‘statically linked’ means that the program and the particular library that it’s
linked against are combined together by the linker at link time. This means that the
binding between the program and the particular library is fixed and known at link time
before the program run. It also means that we can't change this binding, unless we re-
link the program with a new version of the library.
Programs that are linked statically are linked against archives of objects (libraries) that
typically have the extension of .a. An example of such a collection of objects is the
standard C library, libc.a. You might consider linking a program statically for example,
in cases where you weren't sure whether the correct version of a library will be
available at runtime, or if you were testing a new version of a library that you don't yet
want to install as shared. For gcc, the –static option is used during the
compilation/linking of the program.
ASCII Character set[edit]
ASCII (1977/1986)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
1_ 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B 001C 001D 001E 001F
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
SP
2_ 0020
! " # $ % & ' ( ) * + , - . /
0021 0022 0023 0024 0025 0026 0027 0028 0029 002A 002B 002C 002D 002E 002F
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
3_
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 003A 003B 003C 003D 003E 003F
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
4_
@ A B C D E F G H I J K L M N O
0040 0041 0042 0043 0044 0045 0046 0047 0048 0049 004A 004B 004C 004D 004E 004F
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
5_
P Q R S T U V W X Y Z [ \ ] ^ _
0050 0051 0052 0053 0054 0055 0056 0057 0058 0059 005A 005B 005C 005D 005E 005F
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
6_
` a b c d e f g h i j k l m n o
0060 0061 0062 0063 0064 0065 0066 0067 0068 0069 006A 006B 006C 006D 006E 006F
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
7_
p q r s t u v w x y z { | } ~ DEL
007F
0070 0071 0072 0073 0074 0075 0076 0077 0078 0079 007A 007B 007C 007D 007E
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127