0% found this document useful (0 votes)
32 views

Chapter Two - Assembler

The document describes the design of an assembler program. It discusses the general design procedures including specifying requirements, data structures, algorithms, and modularity. It then provides more details on the specific design of an assembler, including stating the problem as translating an assembly language program to machine code. It describes the data structures needed for the two passes of the assembler, including symbol tables, opcode tables, and location counters. Algorithms for the two passes are discussed to evaluate symbols, generate instructions, and process pseudo-operations.

Uploaded by

amanuel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Chapter Two - Assembler

The document describes the design of an assembler program. It discusses the general design procedures including specifying requirements, data structures, algorithms, and modularity. It then provides more details on the specific design of an assembler, including stating the problem as translating an assembly language program to machine code. It describes the data structures needed for the two passes of the assembler, including symbol tables, opcode tables, and location counters. Algorithms for the two passes are discussed to evaluate symbols, generate instructions, and process pseudo-operations.

Uploaded by

amanuel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

SYSTEM PROGRAMMING

Chapter Two
Assemblers
After you have studied this unit, you will be able to:

• Know tasks of an assembler


• Understand the general design procedure of a software
• Know detail design of an assembler

Definition

Assembler is system software which converts an assembly language program to


its equivalent object code. The input to the assembler is a source code written in
assembly language (using mnemonics) and the output is an object code. Basic
Assembler functions: Translating mnemonic language to its equivalent object code.
Assigning machine addresses to symbolic labels.

Fig2.1: Functions of an assembler

General design procedures

In our design of an assembler, we are interested in producing machine language from a


given assembly language. Let’s start with the general problem of designing a software.
We usually follow the following steps to design a software.

1. Specify the problem:


• specify all the requirements
2. Specify the data structure: (use set of tables)
• List out tables with fields (eg. Symbol table, opcode table, size of instruction)

COMPILED BY: EPHREM W. 1


SYSTEM PROGRAMMING

3. Define format of data structure:


• formats of database to be used
4. Specify algorithm:
• Scan the program for labels-> first pass algorithm
• Use for translation-> second pass algorithm
5. Look for modularity
• divide program into smaller modules
6. Repeat 1 ~ 5 on modules

These general software design procedures will be employed in our assembler design

Design of an Assembler

In this section we will discuss the fundamental Assembler design procedures using
relevant examples and justifications. The topics that will be discussed in this section
are:

• Statement of problem

• Data structure

• Format of databases

• Algorithms

• Look for modularity

Simple example Assembly Language programs will be used to show how assemblers
work.

Statement of the Problem

Let’s take the following assembly language program that we want to translate it to a
machine code / object code. Its immediate translation into machine code is available
on the right side.

COMPILED BY: EPHREM W. 2


SYSTEM PROGRAMMING

Fig 2.2: An assembly program and its equivalent machine code

Discussion of above fig

✓ There is no BALR->presumably called by other program that left the address of


the first instruction at register 15.
✓ L 1,FIVE: No index register, base register = 15, and we do not know the offset L
1,_(0,15)
• We maintain a location counter indicating the address of the instruction being
executed and it is incremented by 4. Why?
✓ A 1,FOUR: same thing happens with this and Store instruction
✓ DC is a pseudo-op directing us to define some data stored at 12 and 16 relative
addresses……..
✓ As assembler we can now fill the offsets with values listed in the third column
(using the location counter).
• Because symbols can appear before they are defined, it is good to make two
passes (pass1: define symbols & pass 2: generate instructions and addresses)

COMPILED BY: EPHREM W. 3


SYSTEM PROGRAMMING

Tasks Performed by an Assembler

1. Generate instruction
✓ Evaluate the mnemonic in the operation field and generate its machine
code.
✓ Evaluate sub-fields- find the value of symbols and assign address.
2. Process pseudo-ops- like USING, DC, DS etc…
✓ These tasks can be grouped into two passes or sequential scans over the
input.
• Associated with each task are one or more assembler modules.
✓ We can also have one pass assemblers that perform all the tasks in one
scan as well as multiple pass assemblers (two or more scans performed)

Two Pass Assemblers

✓ Pass 1 – Define symbol and literals


• Determine length of machine instructions (MOTGET1)
• Keep track of location counter (LC)
• Remember values of symbol until pass 2 (STSTO)
• Process some pseudo-ops …EQU, DS
• Remember literals (LITSTO)
✓ Pass 2 – Generate object program
• Look up value of symbols (STGET)
• Generate Instructions (MOTGET2)
• Generate data for DS, DC and literals
• Process pseudo-ops (POTGET2)

Data Structures

Second step in our assembler design is establishing a data base that our assembler will
work with.

COMPILED BY: EPHREM W. 4


SYSTEM PROGRAMMING

✓ Pass 1 Data Base


• Input source program
• A location counter (keep track of instruction’s location)
• A table, Machine Operation Table (MOT)
➢ Indicates the symbolic mnemonic for each instruction and
its length (2, 4 or 6 bytes)
• A table, Pseudo-Operation Table (POT)
➢ Indicate symbolic mnemonic and action to be taken for each
pseudo-ops in Pass 1.
• A table, Symbol Tables (ST) – store label and its value.
• A table, Literal Table (LT)- literal and assigned location
• Copy of input to be used in Pass 2- can be stored in Secondary
storage.

ye

D
D

Fig 2.3: High level flow chart for pass 1

COMPILED BY: EPHREM W. 5


SYSTEM PROGRAMMING

✓ Pass 2 Data bases


• Copy of source program input to Pass 1.
• Location Counter (LC)
• A table, Machine Operation Table (MOT) - that indicates for each
instruction:
➢ Symbolic mnemonic, Length, Binary machine op-code &
format(RS,RX, SI)
• A table, Pseudo-Operation Table (POT) - that indicate for each
pseudo-op the symbolic mnemonic and action to be taken in Pass 2.
• A table, Symbol Tables (ST) – prepared by pass 1 containing each label
and its corresponding value.
• A table, Base Table (BT) - indicates which registers are currently
specified as base register by USING pseudo-op and its contents.
• A work space, INST, used to hold each instruction as its various parts
(eg. Binary op-code, register fields, length fields, displacement fields)
are being assembled together.
• A work space, PRINT LINE, used to produce a printed listing
• A work space, PUNCH CARD, used prior to actual outputting for
converting the assembled instruction into a format suitable for loader.
• An output program in a format suitable for the loader

COMPILED BY: EPHREM W. 6


SYSTEM PROGRAMMING

Fg Pass 2 →Evaluate Fields and Generic

Fig 2.4: High level flow chart for pass 2

COMPILED BY: EPHREM W. 7


SYSTEM PROGRAMMING

Format of Data Bases

✓ The format of data bases section specifies the format and content of each
data base- a task that needs to be undertaken even before describing the
specific algorithm. In reality Algorithm, Data bases, and Format are all
interlocked. The designer has in mind some features of the format and
algorithm when dealing with the data bases and iterate till all parts work.
✓ Pass one requires a MOT with name and length whereas pass 2 requires
name, length, binary code and format. We can use two tables with different
format and contents or one table for both passes. This is true for POT as
well. We can also combine the POT and MOT into one table by generalizing
the table formats. In our example we will use same table for MOT and
separate tables for POT.
✓ Once we decide what information belong to each database, we can decide
the format of the each entry. Eg. In what format are symbols stored (left
justified, padded with blanks, coded in EBCDIC or ASCCI) and what are
the encoding conventions. EBCDIC- Extended Binary Code Decimal
Interchange Code is the standard 360 coding scheme. Character A in
EBCDIC is 1100 0001 or C1 in hex.
✓ Pass 2 requires MOT and POT containing name, length, binary code and
format. Pass 1 requires MOT and POT containing name and length.POT
and MOT are fixed tables: their contents are not filled in or altered during
the assembly process. The op code is the key and its value is binary op
code equivalent, which is stored for use in generating machine code. The
instruction length is stored for use in updating the location counter. The
instruction format for use in forming equivalent machine code.

COMPILED BY: EPHREM W. 8


SYSTEM PROGRAMMING

Fig 2.5: Use of data bases by assembler passes

COMPILED BY: EPHREM W. 9


SYSTEM PROGRAMMING

Fig 2.6: Possible content and format of MOT for passes 1 & 2

COMPILED BY: EPHREM W. 10


SYSTEM PROGRAMMING

Fig 2.7: Possible content and format of POT for pass 1 (similar for pass 2)

Format ST and LT

✓ Symbol and Literal Tables include not only name and assemblytime values
but also length field and relative location counter.
✓ The length field indicates the length in bytes of the instruction or data to
which the symbol is attached.
✓ Used by assembler to calculate length of an SS-type instruction.
✓ Eg: COMMA DC C’,’ …. Has length 1 and TEMP DS F …. Has length 4
✓ The relative location counter tells the assembler whether the value is
relative or absolute. R for relative and A for absolute.

COMPILED BY: EPHREM W. 11


SYSTEM PROGRAMMING

Fig 2.8: Symbol table for passes 1 & 2:

Format: Base Table

✓ BT is used by the assembler to generate the proper base register reference in


machine instruction and to compute the correct offset
✓ The assembler must generate an address (offset, a base register number and
index register number) for most symbolic references.
✓ The ST contains the address of a symbol relative to the beginning of the program.

Fig 2.9: Base table for pass 2

COMPILED BY: EPHREM W. 12


SYSTEM PROGRAMMING

Sample Assembly Source Program

We will illustrate the use of tables (ST, LT, BT etc.) using the following program and
motivate it for our algorithm presented in next section.

Fig 2.10: ST and LT for sample assembly program

COMPILED BY: EPHREM W. 13


SYSTEM PROGRAMMING

Discussion

✓ As indicated in Fig 2.3 (flow chart), the assembler scans the program
keeping a location counter.
• For each symbol in the label field we make an entry in the symbol
table. Eg. For PGM2, its value is its relative location (length 1).
• We update the location counter by noting the LA instruction is 4
bytes and SR 2 bytes long.
• Next five symbols are defined by EQU -> these symbols and
associated
value given in the argument field are entered into table.
• LC is further updated, noting L is 4 and SR is two bytes long.
• None of the pseudo-ops occurred affect the value of the LC as they
did not result in any object code. Hence LA has the value 12 when
LOOP is encountered.
✓ In the same pass all literals are entered into LT; the 1st literal is in
statement 11
and its value is the address of the location that will contain the literal.
• LTORG pseudo-op forces the LT to be placed where the LC
is updated to the next double word boundary (48).Value of
‘=A (DATA1)’ is its address, 48. Similarly, value of the literal
F’5’ is next location in the LT, 52 and so one.

Pass 2: Generate opcode and evaluate arguments

To generate a proper address in the instruction, we need the base register. Base table,
BT, shows registers in use.

COMPILED BY: EPHREM W. 14


SYSTEM PROGRAMMING

Processing the USING psedo-op in the program gives these BT tables. To calculate the
offset we need the contents of the base register. The assembler does not know the
execution time value of the base register. It knows only relative to the beginning of the
program. Hence the assembler enters as ‘contents’ its relative value, which is used to
calculate the offset.

✓ For each instruction in Pass 2, we create the equivalent machine language


code.
• Eg. For statement 3; look value of SETUP in ST (which is 6).
Then Look up value of op-code in MOT (binary op-code for
LA). Finally Formulate address:
➢ Determine base register.
➢ After that pick one with closest value to SETUP (R15).
➢ Then Offset = value of SETUP – value of base register
= 6-0 = 6.
➢ Finally formulate address -> Offset(index register,
base register) = 6(0,15).
➢ Average output code in appropriate formula

• Similarly, we generate instructions for the remaining code


as below.

COMPILED BY: EPHREM W. 15


SYSTEM PROGRAMMING

Algorithm

To show, a simplified algorithm for passes 1 & 2, illustrating most of the logical
processes involved, two flow chart diagrams are used one for each pass.

✓ Pass 1: Define Symbols


• Assign location to each instruction and data defining pseudo-ops.
• Define values for symbols appearing in the label field.
• Initially LC set to first location in the program (relative address 0)

COMPILED BY: EPHREM W. 16


SYSTEM PROGRAMMING

• Then source statement is read examine op-code if it is pseudo-op.


If not, MOT is searched. Matched entry specifies the length (2, 4 or
6 bytes)
• Operand field is scanned for presence of Literals-> if found added
to LT for later processing.
• Label field is examined ->if there is a symbol -> added to ST along
with the value of LC.
• Finally LC incremented by length of current instruction and a copy
is saved for pass 2. …repeat this for all instructions….
✓ Pass 2: Generate Code
• Once all symbols are defined (Pass 1) it is possible to finish the
assembly by: Determining value for operation code. Determine
value for operand fields
• Moreover, pass 2 structures generated code into a format suitable
for a loader.
• LC is initialized in the same fashion as pass 1 and processing
continues as follows. Instruction read from source file left by Pass
1. Examine operation field to determine if it is pseudo-op -> if not
search MOT and find the op-code. Matching entry specifies: Length,
Binary opcode and Format of instruction. Operand fields of different
instruction formats need different processing.
• Finally a listing line containing copy of source code, hex value and
location is printed… LC incremented and processing continues….

COMPILED BY: EPHREM W. 17


SYSTEM PROGRAMMING

Fig 2.11: Pass 1 Algorithm flow chart

COMPILED BY: EPHREM W. 18


SYSTEM PROGRAMMING

Fig 2.12: Pass 2 Algorithm flow chart

COMPILED BY: EPHREM W. 19


SYSTEM PROGRAMMING

Look for Modularity

✓ We now review to improve our design, looking for functionalities that can be
isolated. Modules/functions can be multi-use or unique.
✓ Let’s look at our algorithms for passes 1 & 2 and see if we can find a logical
separation and put them in the following format.

Where name is the name assigned to the function like


MOTGET, EVAL, PRINT, POTGET etc……
✓ Accordingly we can list some logical modules that may be isolated
in passes 1 & 2. These functions are more or less indicated in the flow chart for
the
algorithms in both passes. The tables next summarize functions we may consider
for modularity,
isolating from the rest of the algorithm so that the module will be autonomous in
its processing.

COMPILED BY: EPHREM W. 20


SYSTEM PROGRAMMING

Summary

✓ Assembler is system software which converts an assembly language program to


its equivalent object code.
✓ Fundamental Assembler design procedure involves the following steps:
• Statement of problem
• Data structure
• Format of databases
• Algorithms
• Look for modularity
✓ Tasks Performed by an Assembler
• Generate instruction
• Process pseudo-ops- like USING, DC, DS etc…

END OF CHAPTER TWO!

COMPILED BY: EPHREM W. 21

You might also like