Chapter Two - Assembler
Chapter Two - Assembler
Chapter Two
Assemblers
After you have studied this unit, you will be able to:
Definition
These general software design procedures will be employed in our assembler design
Design of an Assembler
In this section we will discuss the fundamental Assembler design procedures using
relevant examples and justifications. The topics that will be discussed in this section
are:
• Statement of problem
• Data structure
• Format of databases
• Algorithms
Simple example Assembly Language programs will be used to show how assemblers
work.
Let’s take the following assembly language program that we want to translate it to a
machine code / object code. Its immediate translation into machine code is available
on the right side.
1. Generate instruction
✓ Evaluate the mnemonic in the operation field and generate its machine
code.
✓ Evaluate sub-fields- find the value of symbols and assign address.
2. Process pseudo-ops- like USING, DC, DS etc…
✓ These tasks can be grouped into two passes or sequential scans over the
input.
• Associated with each task are one or more assembler modules.
✓ We can also have one pass assemblers that perform all the tasks in one
scan as well as multiple pass assemblers (two or more scans performed)
Data Structures
Second step in our assembler design is establishing a data base that our assembler will
work with.
ye
D
D
✓ The format of data bases section specifies the format and content of each
data base- a task that needs to be undertaken even before describing the
specific algorithm. In reality Algorithm, Data bases, and Format are all
interlocked. The designer has in mind some features of the format and
algorithm when dealing with the data bases and iterate till all parts work.
✓ Pass one requires a MOT with name and length whereas pass 2 requires
name, length, binary code and format. We can use two tables with different
format and contents or one table for both passes. This is true for POT as
well. We can also combine the POT and MOT into one table by generalizing
the table formats. In our example we will use same table for MOT and
separate tables for POT.
✓ Once we decide what information belong to each database, we can decide
the format of the each entry. Eg. In what format are symbols stored (left
justified, padded with blanks, coded in EBCDIC or ASCCI) and what are
the encoding conventions. EBCDIC- Extended Binary Code Decimal
Interchange Code is the standard 360 coding scheme. Character A in
EBCDIC is 1100 0001 or C1 in hex.
✓ Pass 2 requires MOT and POT containing name, length, binary code and
format. Pass 1 requires MOT and POT containing name and length.POT
and MOT are fixed tables: their contents are not filled in or altered during
the assembly process. The op code is the key and its value is binary op
code equivalent, which is stored for use in generating machine code. The
instruction length is stored for use in updating the location counter. The
instruction format for use in forming equivalent machine code.
Fig 2.6: Possible content and format of MOT for passes 1 & 2
Fig 2.7: Possible content and format of POT for pass 1 (similar for pass 2)
Format ST and LT
✓ Symbol and Literal Tables include not only name and assemblytime values
but also length field and relative location counter.
✓ The length field indicates the length in bytes of the instruction or data to
which the symbol is attached.
✓ Used by assembler to calculate length of an SS-type instruction.
✓ Eg: COMMA DC C’,’ …. Has length 1 and TEMP DS F …. Has length 4
✓ The relative location counter tells the assembler whether the value is
relative or absolute. R for relative and A for absolute.
We will illustrate the use of tables (ST, LT, BT etc.) using the following program and
motivate it for our algorithm presented in next section.
Discussion
✓ As indicated in Fig 2.3 (flow chart), the assembler scans the program
keeping a location counter.
• For each symbol in the label field we make an entry in the symbol
table. Eg. For PGM2, its value is its relative location (length 1).
• We update the location counter by noting the LA instruction is 4
bytes and SR 2 bytes long.
• Next five symbols are defined by EQU -> these symbols and
associated
value given in the argument field are entered into table.
• LC is further updated, noting L is 4 and SR is two bytes long.
• None of the pseudo-ops occurred affect the value of the LC as they
did not result in any object code. Hence LA has the value 12 when
LOOP is encountered.
✓ In the same pass all literals are entered into LT; the 1st literal is in
statement 11
and its value is the address of the location that will contain the literal.
• LTORG pseudo-op forces the LT to be placed where the LC
is updated to the next double word boundary (48).Value of
‘=A (DATA1)’ is its address, 48. Similarly, value of the literal
F’5’ is next location in the LT, 52 and so one.
To generate a proper address in the instruction, we need the base register. Base table,
BT, shows registers in use.
Processing the USING psedo-op in the program gives these BT tables. To calculate the
offset we need the contents of the base register. The assembler does not know the
execution time value of the base register. It knows only relative to the beginning of the
program. Hence the assembler enters as ‘contents’ its relative value, which is used to
calculate the offset.
Algorithm
To show, a simplified algorithm for passes 1 & 2, illustrating most of the logical
processes involved, two flow chart diagrams are used one for each pass.
✓ We now review to improve our design, looking for functionalities that can be
isolated. Modules/functions can be multi-use or unique.
✓ Let’s look at our algorithms for passes 1 & 2 and see if we can find a logical
separation and put them in the following format.
Summary