Unit-I: Register Transfer
Unit-I: Register Transfer
Register Transfer
The term Register Transfer refers to the availability of hardware logic circuits that can perform a given
micro-operation and transfer the result of the operation to the same or another register.
Most of the standard notations used for specifying operations on various registers are stated below.
3. The numbering of bits in a register can be marked on the top of the box as shown in (c).
1
4. A 16-bit register PC is divided into 2 parts- Bits (0 to 7) are assigned with lower byte of 16-bit
address and bits (8 to 15) are assigned with higher bytes of 16-bit address as shown in (d).
R1(8-bit)
() Denotes a part of register
R1(0-7)
R1 <- R2
Specify two micro-operations of Register
,
Transfer R2 <- R1
P : R2 <-
R1
: Denotes conditional operations
if P=1
If the control function P=1, then load the content of R1 into R2 and at the same clock load the
content of R2 into R1
Arithmetic MicroInstructions :
The four basic arithmetic operations are addition, subtraction, multiplication, and
division. Most computers provide instructions for all four operations.
Typical Arithmetic Instructions –
Name Mnemonic Example Explanation
It will increment the register B by 1
B<-B+1
Increment INC INC B
3
accumulator
AC<-AC-B
Complement COM
It will complement the carry flag. Carry flag<- (Carry flag)’
carry C COMC
Disable
interrupt DI DI It will disable the interrupt
Shift Micro-Instructions :
Shifts are operations in which the bits of a word are moved to the left or right. Shift
5
instructions may specify either logical shifts, arithmetic shifts, or rotate-type
operations.
Typical Shift Instructions –
Name Mnemonic
Logical shift right SHR
6
Left Arithmetic Shift –
In this one position moves each bit to the left one by one. The empty least significant bit
(LSB) is filled with zero and the most significant bit (MSB) is rejected. Same as the Left
Logical Shift.
7
Right Circular Shift –
Computer Registers
Registers are a type of computer memory used to quickly accept, store, and transfer data and instructions
that are being used immediately by the CPU. The registers used by the CPU are often termed as Processor
registers.
A processor register may hold an instruction, a storage address, or any data (such as bit sequence or
individual characters).
The computer needs processor registers for manipulating data and a register for holding a memory
address. The register holding the memory location is used to calculate the address of the next instruction
after the execution of the current instruction is completed.
The following image shows the register and memory configuration for a basic computer.
o The Memory unit has a capacity of 4096 words, and each word contains 16 bits.
o The Data Register (DR) contains 16 bits which hold the operand read from the memory location.
o The Memory Address Register (MAR) contains 12 bits which hold the address for the memory
location.
o The Program Counter (PC) also contains 12 bits which hold the address of the next instruction to be
read from memory after the current instruction is executed.
o The Accumulator (AC) register is a general purpose processing register.
o The instruction read from memory is placed in the Instruction register (IR).
o The Temporary Register (TR) is used for holding the temporary data during the processing.
o The Input Registers (IR) holds the input characters given by the user.
o The Output Registers (OR) holds the output after processing the input data.
9
Common Bus System
We shall study the common bus system of a very basic computer in this article. A basic
computer has 8 registers, memory unit and a control unit. The diagram of the common bus
system is as shown below.
Connections:
The outputs of all the registers except the OUTR (output register) are connected to the
common bus. The output selected depends upon the binary value of variables S2, S1 and
S0. The lines from common bus are connected to the inputs of the registers and memory. A
register receives the information from the bus when its LD (load) input is activated while in
case of memory the Write input must be enabled to receive the information. The contents of
memory are placed onto the bus when its Read input is activated.
Various Registers:
4 registers DR, AC, IR and TR have 16 bits and 2 registers AR and PC have 12 bits. The
INPR and OUTR have 8 bits each. The INPR receives character from input device and
delivers it to the AC while the OUTR receives character from AC and transfers it to the
10
output device. 5 registers have 3 control inputs LD (load), INR (increment) and CLR
(clear). These types of registers are similar to a binary counter.
The adder and logic circuit provides the 16 inputs of AC. This circuit has 3 sets of inputs.
One set comes from the outputs of AC which implements register micro operations. The
other set comes from the DR (data register) which are used to perform arithmetic and
logic micro operations. The result of these operations is sent to AC while the end around
carry is stored in E as shown in diagram. The third set of inputs is from INPR.
Instruction set
An instruction is a set of codes that the computer processor can understand. The code is usually in
1s and 0s, or machine language. It contains instructions or tasks that control the movement of bits and
bytes within the processor.
Example of some instruction sets −
ADD − Add two numbers together.
JUMP − Jump to designated RAM address.
LOAD − Load information from RAM to the CPU.
Instruction Cycle
Each computer’s CPU can have different cycles based on different instruction sets, but will be similar to the following
cycle:
Fetch Stage: The next instruction is fetched from the memory address that is currently stored in the program counter
and stored in the instruction register. At the end of the fetch operation, the PC points to the next instruction that will be
read in the next cycle.
11
Registers Involved In Each Instruction Cycle
Following are the different types of registers involved in each instruction cycle:
1. Memory address registers(MAR): It is connected to the address lines of the system bus. It specifies the address in
memory for a read or write operation.
2. Memory Buffer Register(MBR): It is connected to the data lines of the system bus. It contains the value to be
stored in memory or the last value read from the memory.
3. Program Counter(PC): Holds the address of the next instruction to be fetched.
4. Instruction Register(IR): Holds the last instruction fetched.
Role of Registers Involved in Instruction Cycle
The program counter (PC) is a special register that holds the memory address of the next instruction to be executed.
During the fetch stage, the address stored in the PC is copied into the memory address register (MAR) and then the PC
is incremented in order to “point” to the memory address of the next instruction to be executed.
The CPU then takes the instruction at the memory address described by the MAR and copies it into the memory data
register (MDR). The MDR also acts as a two-way register that holds data fetched from memory or data waiting to be
stored in memory (it is also known as the memory buffer register (MBR) because of this). Eventually, the instruction
in the MDR is copied into the current instruction register (CIR) which acts as a temporary holding ground for the
instruction that has just been fetched from memory.
During the decode stage, the control unit (CU) will decode the instruction in the CIR. The CU then sends signals to
other components within the CPU, such as the arithmetic logic unit (ALU) and the floating-point unit (FPU). The
ALU performs arithmetic operations such as addition and subtraction and also multiplication by repeated addition and
division via repeated subtraction. It also performs logic operations such as AND, OR, NOT, and binary shifts as well.
The FPU is reserved for performing floating-point operations.
Components of Instruction Cycle
Following are the main components of every instruction cycle:
1. Fetch Cycle
The fetching of instruction is the first phase. The fetch instruction is common for each instruction executed in a central
processing unit. In this phase, the central processing unit sends the PC to MAR and then sends the READ command
into a control bus.
After sending a read command on the data bus, the memory returns the instruction, which is stored at that particular
address in the memory. Then, the CPU copies data from the data bus into MBR and then copies the data from MBR to
registers.
After all this, the pointer is incremented to the next memory location so that the next instruction can be fetched from
memory.
2. Decode Cycle
The decoding of instruction is the second phase. In this phase, the CPU determines which instruction is fetched from
the instruction and what action needs to be performed on the instruction. The opcode for the instruction is also fetched
from memory and decodes the related operation which needs to be performed for the related instruction.
The reading of an effective address is the third phase. This phase deals with the decision of the operation. The
operation can be of any type of memory type non-memory type operation. Memory instruction can be categorized into
two categories: direct memory instruction and indirect memory instruction.
3. Execute Cycle
The execution of instruction is the last phase. In this stage, the instruction is finally executed. The instruction is
executed, and the result of the instruction is stored in the register. After the execution of an instruction, the CPU
prepares itself for the execution of the next instruction. For every instruction, the execution time is calculated, which is
used to tell the processing speed of the processor.
The basic computer has 16-bit instruction register (IR) which can denote either memory
reference or register reference or input-output instruction.
12
1. Memory Reference Instruction– These instructions refer to memory address as an
operand. The other operand is always accumulator. Specifies 12-bit address, 3-bit opcode
(other than 111) and 1-bit addressing mode for direct and indirect addressing.
Example –
IR register contains = 0001XXXXXXXXXXXX, i.e. ADD after fetching and decoding of
instruction we find out that it is a memory reference instruction for ADD operation.
Hence, DR ← M[AR]
AC ← AC + DR, SC ← 0
3. Input/Output – These instructions are for communication between computer and outside
environment. The IR(14 – 12) is 111 (differentiates it from memory reference) and IR(15) is 1
(differentiates it from register reference instructions). The rest 12 bits specify I/O operation.
Example –
IR register contains = 1111100000000000, i.e. INP after fetch and decode cycle we find out
that it is an input/output instruction for inputing character. Hence, INPUT character from
peripheral device.
13
I/O Configuration
The terminals send and receive serial information. Each portion of serial data has eight bits of
alphanumeric code, where the leftmost bit is continually 0. The serial data from the input register is
transferred into the input register INPR. The output register OUTR can save the serial data for the printer.
These two registers interact with the Accumulator (AC) in parallel and with a communication interface in a
serial structure.
The Input/Output configuration is displayed in the figure. The transmitter interface gets serial data from the
keyboard and sends it to INPR. The receiver interface gets data from OUTR and transfers it to the printer
serially.
The input/output registers include eight bits. The FGI is a 1-bit input flag, which is a control flip-flop. The
flag bit is set to 1 when new data is accessible in the input device and is cleared to 0 when the data is
approved through the device.
When a key is clicked on the keyboard, the alphanumeric code equivalent to the key is shifted to INPR and
the input flag FGI is set to 0. The data in INPR cannot be modified considering the flag is set. The device
tests the flag bit; if it is 1, the data from INPR is sent in parallel into AC, and FGI is cleared to 0.
The output register OUTR works equivalent to the input register INPR.
The flow of data by the OUTR is the opposite of INPR. Therefore, the output flag FGO is set to 1 originally.
The device tests the flag bit; if it is 1, the data from AC is sent in parallel to OUTR, and FGO is cleared to
0. The new data cannot be loaded into OUTR when the FGO is 0 because this condition denotes that the
output device is in the procedure of printing a character.
Input Register:
The INPR input register is a register that includes eight bits and influences alphanumeric input data. The 1-
bit input flag FGI is a control flip-flop. When new data is accessible in the input device, the flag bit is set to
1. It is cleared to 0 when the data is approved by the device. The flag is needed to synchronize the timing
rate difference between the input device and the computer.
The process of data transfer is as follows −
14
The input flag FGI is set to 0. When a user clicks any key on the keyboard, an 8-bit alphanumeric code is
transferred into INPR and the input flag FGI is set to 1.
The device tests the flag bit. If the bit is 1, thus the data from INPR is transferred to AC and together FGI is
cleared to 0.
Then the flag is cleared, new data can be transferred into INPR by introducing another key.
Output Register:
The working of the output register OUTR is equivalent to that of the input register INPR, therefore the
control of data flow is in the opposite.
The procedure of data transfer is as follows −
16
Booth's Multiplication Algorithm
The booth algorithm is a multiplication algorithm that allows us to multiply the two signed binary
integers in 2's complement, respectively. It is also used to speed up the performance of the multiplication
process. It is very efficient too. It works on the string bits 0's in the multiplier that requires no additional bit
only shift the right-most string bits and a string of 1's in a multiplier bit weight 2 k to weight 2m that can be
considered as 2k+ 1 - 2m.
In the above flowchart, initially, AC and Qn + 1 bits are set to 0, and the SC is a sequence counter that
represents the total bits set n, which is equal to the number of bits in the multiplier. There are BR that
represent the multiplicand bits, and QR represents the multiplier bits. After that, we encountered two bits
of the multiplier as Qn and Qn + 1, where Qn represents the last bit of QR, and Q n + 1 represents the
incremented bit of Qn by 1. Suppose two bits of the multiplier is equal to 10; it means that we have to
subtract the multiplier from the partial product in the accumulator AC and then perform the arithmetic
shift operation (ashr). If the two of the multipliers equal to 01, it means we need to perform the addition of
the multiplicand to the partial product in accumulator AC and then perform the arithmetic shift operation
(ashr), including Qn + 1. The arithmetic shift operation is used in Booth's algorithm to shift AC and QR bits to
the right by one and remains the sign bit in AC unchanged. And the sequence counter is continuously
decremented till the computational loop is repeated, equal to the number of bits (n).
17
Working on the Booth Algorithm
1. Set the Multiplicand and Multiplier binary bits as M and Q, respectively.
2. Initially, we set the AC and Qn + 1 registers value to 0.
3. SC represents the number of Multiplier bits (Q), and it is a sequence counter that is continuously
decremented till equal to the number of bits (n) or reached to 0.
4. A Qn represents the last bit of the Q, and the Qn+1 shows the incremented bit of Qn by 1.
5. On each cycle of the booth algorithm, Qn and Qn + 1 bits will be checked on the following parameters
as follows:
i. When two bits Qn and Qn + 1 are 00 or 11, we simply perform the arithmetic shift right
operation (ashr) to the partial product AC. And the bits of Qn and Q n + 1 is incremented by 1
bit.
18
ii. If the bits of Qn and Qn + 1 is shows to 01, the multiplicand bits (M) will be added to the AC
(Accumulator register). After that, we perform the right shift operation to the AC and QR bits
by 1.
iii. If the bits of Qn and Qn + 1 is shows to 10, the multiplicand bits (M) will be subtracted from
the AC (Accumulator register). After that, we perform the right shift operation to the AC and
QR bits by 1.
6. The operation continuously works till we reached n - 1 bit in the booth algorithm.
7. Results of the Multiplication binary bits will be stored in the AC and QR registers.
19
UNIT-II
CONTROL UNIT
In a system or computer, most of the tasks are controlled with the help of a processor or
CPU (Central processing unit), which is the main component of a computer. The CPU usually has
two main systems: control unit (CU) and arithmetic and logic unit (ALU). The control unit (CU) is
used to synchronize the tasks with the help of sending timings and control signals. On the other
hand, mathematical and logical operations can be handled with the help of ALU. Micro
programmed control units and hardwired control units can be called two types of control units. We
can execute an instruction with the help of these two control units.
In the hardwired control unit, the execution of operations is much faster, but the implementation,
modification, and decoding are difficult. In contrast, implementing, modifying, decoding micro-
programmed control units is very easy. The micro-programmed control unit is also able to
handle complex instructions. With the help of control signals generated by micro-programmed
and hardwired control units, we are able to fetch and execute the instructions.
Control Signals
In order to generate the control signals, both the control signals were basically designed. The
functionality of a processor's hardware is operated with the help of these control signals. The
control signals are used to know about various types of things, which are described as follows:
The image of a hardwired control unit is described as follows, which contains various components in the
form of circuitry. We will discuss them one by one so that we can properly understand the "generation of
control signals".
20
The instruction register is a type of processor register used to contain an instruction that is
currently in execution. As we can see, the instruction register is used to generate the OP-code bits
respective of the operation as well as the addressing mode of operands.
The above generated Op-code bits are received in the field of an instruction decoder. The
instruction decoder interprets the operation and instruction's addressing mode. Now on the basis
of the addressing mode of instruction and operation which exists in the instruction register, the
instruction decoder sets the corresponding Instruction signal INS i to 1. Some steps are used to
execute each instruction, i.e., instruction
fetch, decode, operand fetch, Arithmetic and logical unit, and memory store. Different books
might be contained different steps. But in general, we are able to execute an instruction with the
help of these five steps.
o The information about the current step of instruction must be known by the control unit.
Now the Step Counter is implemented, which is used to contain the signals from T1,…., T5.
Now on the basis of the step which contains the instruction, one of the signals of a step
counter will be set from T1 to T5 to 1.
o Now we have a question that how the step counter knows about the current step of
instruction? So to know the current step, a Clock is implemented. The one-clock cycle of the
clock will be completed for each step. For example, suppose that if the stop counter sets T3
to 1, then after completing one clock cycle, the step counter will set T4 to 1.
o Now we have a question, i.e., what will happen if the execution of an instruction is
interrupted for some reason? Will the step counter still be triggered by the clock? The
answer to this question is No. As long as the execution is current step is completed,
the Counter Enable will "disable" the Step Counter so that it will stop then increment to the
next step signal.
21
o Now we have a question, i.e., what if the execution of instruction depends on some
conditions? In this case, the Condition Signals will be used. There are various conditions in
which the signals are generated with the help of control signals that can be less than,
greater than, less than equal, greater than equal, and many more.
o The external input is the last one. It is used to tell the Control Signal Generator about the
interrupts, which will affect the execution of an instruction.
So, on the basis of the input obtained by the conditional signals, step counter, external inputs, and
instruction register, the control signals will be generated with the help of Control signal Generator.
Like the above, the instruction execution in a micro-programmed control unit is also performed in
steps. So for each step, the micro-program contains a control word/ microinstruction. If we want to
execute a particular instruction, we need a sequence of microinstructions. This process is known as
the micro-routine. The image of a micro-programmed control unit is described as follows. Here, we
will learn the organization of micro-program, micro-routine, and control word/ microinstruction.
o Instruction fetch is the first step. In this step, the instruction is fetched from the IR
(Instruction Register) with the help of a Microinstruction address register.
22
o Decode is the second step. In this step, the instructions obtained from the instruction
register will be decoded with the help of a microinstruction address generator. Here we will
also get the starting address of a micro-routine. With the help of this address, we can easily
perform the operation, which is mentioned in the instruction. It will also load the starting
address into the micro-program counter.
o Increment is the third step. In this step, the control word, which corresponds to the starting
address of a micro-program, will be read. When the execution proceeds, the value of the
micro-program counter will be increased so that it can read the successive control words of
a micro-routine.
o End bit is the fourth step. In this step, the microinstruction of a micro-routine contains a bit,
which is known as the end bit. The execution of the microinstruction will be successfully
completed when the end bit is set to 1.
o This is the last step, and in this step, the micro-program address generator will again go
back to Step 1 so that we can fetch a new instruction, and this process or cycle goes on.
So in the micro-programmed control unit, the micro-programs are stored with the help of Control
memory or Control store. The implementation of this CU is very easy and flexible, but it is slower as
compared to the Hardwired control unit.
Hardwired control unit generates the control Microprogrammed control unit generates the
signals needed for the processor using logic control signals with the help of micro
circuits instructions stored in control memory
Difficult to modify as the control signals that need Easy to modify as the modification need to
to be generated are hard wired be done only at the instruction level
Only limited number of instructions are used due to Control signals for many instructions can be
the hardware implementation generated
23
Hardwired Control Unit Microprogrammed Control Unit
Used in computer that makes use of Reduced Used in computer that makes use of
Instruction Set Computers(RISC) Complex Instruction Set Computers(CISC)
CONTROL MEMORY:
A control memory is a part of the control unit. Any computer that involves
microprogrammed control consists of two memories. They are the main memory and the control
memory. Programs are usually stored in the main memory by the users. Whenever the programs
change, the data is also modified in the main memory. They consist of machine instructions and
data.
The control memory consists of microprograms that are fixed and cannot be modified frequently.
They contain microinstructions that specify the internal control signals required to execute register
micro-operations.
The machine instructions generate a chain of microinstructions in the control memory. Their
function is to generate micro-operations that can fetch instructions from the main memory,
compute the effective address, execute the operation, and return control to fetch phase and
continue the cycle.
Here, the control is presumed to be a Read-Only Memory (ROM), where all the control
information is stored permanently. ROM provides the address of the microinstruction. The other
register, that is, the control data register stores the microinstruction that is read from the memory.
It consists of a control word that holds one or more micro-operations for the data processor.
The next address must be computed once this operation is completed. It is computed in the next
address generator. Then, it is sent to the control address register to be read. The next address
generator is also known as the microprogram sequencer. Based on the inputs to a sequencer, it
determines the address of the next microinstruction. The microinstructions can be specified in
several ways.
The main functions of a microprogram sequencer are as follows −
ADDRESS SEQUENCING
24
The control memory is used to store the microinstructions in groups. Here each group is used to
specify a routine. The control memory of each computer has the instructions which contain their
micro-programs routine. These micro-programs are used to generate the micro-operations that
will be used to execute the instructions. Suppose the address sequencing of control memory is
controlled by the hardware. In that case, that hardware must be capable to branch from one
routine to another routine and also able to apply sequencing of microinstructions within a routine.
When we try to execute a single instruction of computer, the control must undergo the following
steps:
o When the power of a computer is turned on, we have to first load an initial address into the
CAR (control address register). This address can be described as the first microinstruction
address. With the help of this address, we are able to activate the instruction fetch routine.
o Then, the control memory will go through the routine, which will be used to find out the
effective address of operand.
o In the next step, a micro-operation will be generated, which will be used to execute the
instruction fetched from memory.
We are able to transform the bits of instruction code into an address with the help of control
memory where routine is located. This process can be called the mapping process. The control
memory required the capabilities of address sequencing, which is described as follows:
o On the basis of the status bit conditions, the address sequencing selects the conditional
branch or unconditional branch.
o Addressing sequence is able to increment the CAR (Control address register).
o It provides the facility for subroutine calls and returns.
o A mappings process is provided by the addressing sequence from the instructions bits to a
control memory address.
25
In the above diagram, we can see a block diagram of a control memory and associative hardware,
which is required for selecting the address of next microinstruction. The microinstruction is used to
contain a set of bits in the control memory. With the help of some bits, we are able to start the
micro-operations in a computer register. The remaining bits of microinstruction are used to specify
the method by which we are able to obtain the next address.
In this diagram, we can also see that the control address register are able to recover their address
with the help of four different directions. The CAR is incremented with the help of incrementer and
then chooses the next instruction. The branching address will be determined in the multiple fields
of microinstruction so that they can provide results in branching.
If there are status bits of microinstruction and we want to apply conditions on them, in this case,
we can use conditional branching. An external address can be shared with the help of a mapping
logic circuit. The return address will be saved by a special register. This saved address will be
helpful when the micro-program requires returning from the subroutine. At that time, it requires
the value from the unique register.
(i) Single Accumulator Organization: In this type of organization all operations are performed
on an implied accumulator. The instruction format uses only one address field. For example, the
instruction that loads the accumulator with the contents of a memory location.
Load X
Where X is the address of the source operand. This results in the operation AC ÷— M (X). AC is the
accumulator and M(X) symbolizes the memory word located at address X.
(ii) General Register Organisation : In this organization, the instruction format needs 2 or 3
register address fields according to the operation.
For example, an instruction for addition may be written as
ADD R1, R2, R3,
It denotes the operation R1 <—R2 -f- R3
The same ADD instruction needs only two register address fields if the destination register is one of the
source registers, i.e. if the operation is
R1 R1 + R2
Then the instruction is ADD R1, R2
The instruction may also contain one memory address field and one register address field. For
example, the instruction,
ADD R1, X
Specifies the operation R1 —R1 + M [X]
(iii) Stack Organization : In this organization, the computers will have PUSH and POP instructions
which require an address field. For Kathplê, the instruction PUSH X will push the word at address Xonto
the top of the stack. The operation — type instructions do not need any address field. For example, the
instruction
ADD
Consists of only opcode and no address field. It has the effect of popping the top two numbers from
the stack, adding them, and pushing the sum onto the stack. Thus all the operands are implied to be in
the stack.
INSTRUCTION FORMAT
The set of instructions that manages the operation codes is called the format of instruction. The
design of bits in instruction is supported by the format of instruction. The length of instruction is generally
preserved in multiples of character, which is 8bits. The instruction format determines the behaviour and
complexity of instruction. Depending upon the number of addresses, the format of instruction is of variable
length.
27
Types Of Instruction Format
Types of instruction formats are :
The instruction format in which there is no address field is called zero address [Link] zero address
instruction format, stacks are used
In zero order instruction format, there is no operand
Expression: X = (A+B)*(C+D)
Postfixed : X = AB+CD+*
TOP means top of stack
M[X] is any memory location
PUSH A TOP = A
PUSH B TOP = B
PUSH C TOP = C
PUSH D TOP = D
28
2. One(1) Address Instruction format
The instruction format in which the instruction uses only one address field is called the one address instruction
format
In this type of instruction format, one operand is in the accumulator and the other is in the memory location
It has only one operand
It has two special instructions LOAD and STORE
Expression: X = (A+B)*(C+D)
AC is accumulator
M[] is any memory location
M[T] is temporary location
LOAD A AC = M[A]
ADD B AC = AC + M[B]
STORE T M[T] = AC
LOAD C AC = M[C]
ADD D AC = AC + M[D]
MUL T AC = AC * M[T]
STORE X M[X] = AC
The instruction format in which the instruction uses only two address fields is called the two address instruction
format
This type of instruction format is the most commonly used instruction format
As in one address instruction format, the result is stored in the accumulator only, but in two addresses instruction
format the result can be stored in different locations
This type of instruction format has two operands
It requires shorter assembly language instructions
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location
ADD R2, D R2 = R2 + D
MUL R1, R2 R1 = R1 * R2
MOV X, R1 M[X] = R1
The instruction format in which the instruction uses the three address fields is called the three address
instruction format
It has three operands
It requires shorter assembly language instructions
It requires more bits
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location
ADDRESSING MODES:
The term addressing modes refers to the way in which the operand of an
instruction is specified. The addressing mode specifies a rule for interpreting or modifying
the address field of the instruction before the operand is actually executed.
The operands of the instructions can be located either in the main memory or in the CPU registers.
If the operand is placed in the main memory, then the instruction provides the location address in the
operand field. Many methods are followed to specify the operand address. The different methods/modes for
specifying the operand address in the instructions are known as addressing modes.
Examples-
ADD 10 will increment the value stored in the accumulator by 10.
MOV R #20 initializes register R to a constant value 20.
31
4. Direct Addressing Mode-
In this addressing mode,
The address field of the instruction contains the effective address of the operand.
Only one reference to memory is required to fetch the operand.
It is also called as absolute addressing mode.
Example-
ADD X will increment the value stored in the accumulator by the value stored at memory location X.
AC ← AC + [X]
Example-
ADD X will increment the value stored in the accumulator by the value stored at memory location
specified by X.
AC ← AC + [[X]]
Example-
ADD R will increment the value stored in the accumulator by the content of register R.
AC ← AC + [R]
Example-
ADD R will increment the value stored in the accumulator by the content of memory location
specified in register R.
AC ← AC + [[R]]
NOTE-
It is interesting to note-
This addressing mode is similar to indirect addressing mode. 33
The only difference is address field of the instruction refers to a CPU register.
Effective Address
NOTE-
Program counter (PC) always contains the address of the next instruction to be executed.
After fetching the address of the instruction, the value of program counter immediately increases.
The value increases irrespective of whether the fetched instruction has completely executed or not.
Effective Address
34
10. Base Register Addressing Mode-
In this addressing mode,
Effective address of the operand is obtained by adding the content of base register with the address
part of the instruction.
Effective Address
= Content of Register
NOTE-
This addressing mode is again a special case of Register Indirect Addressing Mode where-
Example-
Load LD
Store ST
Move MOV
Exchange XCH
Input In
Output OUT
37
Name Mnemonic Symbols
Push PUSH
Pop POP
The instructions can be described as follows −
Load − The load instruction is used to transfer data from the memory to a processor
register, which is usually an accumulator.
Store − The store instruction transfers data from processor registers to memory.
Move − The move instruction transfers data from processor register to memory or memory
to processor register or between processor registers itself.
Exchange − The exchange instruction swaps information either between two registers or
between a register and a memory word.
Input − The input instruction transfers data between the processor register and the input
terminal.
Output − The output instruction transfers data between the processor register and the
output terminal.
Push and Pop − The push and pop instructions transfer data between a processor register
and memory stack.
Data manipulation
Data manipulation instructions are those instructions that manipulate or change the content
of the data/registers/memory. It performs operations on data and provides the computational
capabilities of the Computer.
Data manipulation instructions can be categorized into three parts:
1) Arithmetic instruction
2) Logical and bit manipulation instructions
3) Shift instructions
Arithmetic Instruction
Arithmetic instructions include increment, decrement, add, subtract, multiply, divide, add with
Carry, subtract with Borrow, negate that is (2’s) two's complement. If there’s a negative number, it
is considered as negate (so two's complement).
The table given below shows the Arithmetic Instructions:
Name Mnemonic
Increment INC
Decrement DEC
Add ADD
Subtract SUB
Multiply MUL
Divide DIV
38
Add with carry ADDC
Logical Instruction
We are having another list of instructions that is logical and bit manipulation instructions
starting with clear (that means clear the content of accumulator), complement the accumulator,
AND, OR, Exclusive-OR, Clear carry, Set carry, Complement carry, Enable interrupts, Disable
interrupts, all these are logical and bit manipulation instructions.
These logical instructions consider each operand bit individually and treat it as a Boolean
variable. Basically, logical instructions help perform binary operations on strings of bits stored in
registers.
Name Mnemonic
Clear CLR
Complement COM
AND AND
OR OR
Exclusive-OR XOR
Enable Interrupt EI
Disable Interrupt DI
Shift Instructions
There are basically two types of shift instructions — arithmetic and logical. Arithmetic shifts
consider the contents of the memory byte or register to be a signed number. So, when the shift is
made, the number is arithmetically divided by two (right shift) or multiplied by two (left shift).
Logical shifts consider the contents of the register or memory byte to be just a bit pattern when the
shift is made.
Name Mnemonic
39
Arithmetic Shift Left SHLA
Program Control Instructions are the machine code that are used by machine or in
assembly language by user to command the processor act accordingly. These
instructions are of various types. These are used in assembly language by user also. But
in level language, user code is translated into machine code and thus instructions are
passed to instruct the processor do the task.
Types of Program Control Instructions:
There are different types of Program Control Instructions:
1. Compare Instruction:
Compare instruction is specifically provided, which is similar to a subtract instruction
except the result is not stored anywhere, but flags are set according to the result.
Example:
JUMP L2
Mov R3, R1 goto L2
3. Conditional Branch Instruction:
A conditional branch instruction is used to examine the values stored in the condition
code register to determine whether the specific condition exists and to branch if it does.
Example:
Assembly Code : BE R1, R2, L1
Compiler allocates R1 for x and R2 for y
High Level Code: if (x==y) goto L1;
4. Subroutines:
A subroutine is a program fragment that lives in user space, performs a well-defined task.
It is invoked by another user program and returns control to the calling program when
finished.
Example:
CALL and RET
5. Halting Instructions:
HALT – It brings the processor to an orderly halt, remaining in an idle state until
restarted by interrupt, trace, reset or external action.
6. Interrupt Instructions:
Interrupt is a mechanism by which an I/O or an instruction can suspend the normal
execution of processor and get itself serviced.
RESET – It reset the processor. This may include any or all setting registers to an
initial value or setting program counter to standard starting location.
TRAP – It is non-maskable edge and level triggered interrupt. TRAP has the highest
priority and vectored interrupt.
INTR – It is level triggered and maskable interrupt. It has the lowest priority. It can be
disabled by resetting the processor.
A number of computer designers recommended that computers use fewer instructions with
simple constructs so that they can be executed much faster within the CPU without having to use
memory as often. This type of computer is called a Reduced Instruction Set Computer.
The concept of RISC involves an attempt to reduce execution time by simplifying the instruction
set of computers.
Characteristics of RISC
The characteristics of RISC are as follows −
Relatively few instructions.
Relatively few addressing modes.
Memory access limited to load and store instructions.
All operations done within the register of the CPU.
Single-cycle instruction execution.
41
Fixed length, easily decoded instruction format.
Hardwired rather than micro programmed control.
A characteristic of RISC processors’ ability is to execute one instruction per clock cycle. This is
done by overlapping the fetch, decode and execute phases of two or three instructions by using a
procedure referred as pipelining.
CISC is a computer where a single instruction can perform numerous low-level operations
like a load from memory and a store from memory, etc. The CISC attempts to minimize the
number of instructions per program but at the cost of an increase in the number of cycles per
instruction.
The design of an instruction set for a computer must take into consideration not only machine
language constructs but also the requirements imposed on the use of high level programmin g
languages.
The goal of CISC is to attempt to provide a single machine instruction for each statement that is
written in a high level language.
Characteristics of CISC
The characteristics of CISC are as follows −
A large number of instructions typically from 100 to 250 instructions.
Some instructions that perform specialized tasks and are used infrequently.
A large variety of addressing modes- typically from 5 to 20 different modes.
Variable length instruction formats.
Instructions that manipulate operands in memory.
Example
For performing an ADD operation, CISC will execute a single ADD command which will execute
all the required load and store operations.
RISC will execute each operation for loading data from memory, adding values and storing data
back to memory using different low-level instructions.
42
UNIT-III
Peripheral Devices:
The Input / output organization of computer depends upon the size of computer and the
peripherals connected to it. The I/O Subsystem of the computer, provides an efficient modeof
communication between the central system and the outside environment
i) Monitor
ii) Keyboard
iii) Mouse
iv) Printer
v) Magnetic tapes
The devices that are under the direct control of the computer are said to be connected
online.
Peripherals connected to a computer need special communication links for interfacing them
with the central processing unit.
The purpose of communication link is to resolve the differences that exist between the
central computer and each peripheral.
2. The data transfer rate of peripherals is usually slower than the transfer rate of CPU
and consequently, a synchronization mechanism may be needed.
3. Data codes and formats in the peripherals differ from the word format in the CPU and
memory.
4. The operating modes of peripherals are different from each other and must be
controlled so as not to disturb the operation of other peripherals connected to
theCPU.
These components are called Interface Units because they interface between the
processor bus and the peripheral devices.
It defines the typical link between the processor and several peripherals.
The I/O Bus consists of data lines, address lines and control lines. The
To communicate with a particular device, the processor places a device address on address
lines.
Each Interface decodes the address and control received from the I/O bus, interprets them for
peripherals and provides signals for the peripheral controller.
It is also synchronizes the data flow and supervises the transfer between peripheral and
processor.
For example, the printer controller controls the paper motion, the print timing The
control lines are referred as I/O command. The commands are as following:
Control command- A control command is issued to activate the peripheral and to inform it
what to do.
Status command- A status command is used to test various status conditions in the interface
and the peripheral.
Data Output command- A data output command causes the interface to respond by
transferring data from the bus into one of its registers.
Data Input command- The data input command is the opposite of the data output.
In this case the interface receives on item of data from the peripheral and places it in its
buffer register. I/O Versus Memory Bus
44
To communicate with I/O, the processor must communicate with the memory unit. Like the
I/O bus, the memory bus contains data, address and read/write control lines. There are 3 ways
that computer buses can be used to communicate with memory and I/O:
i. Use two Separate buses , one for memory and other for I/O.
ii. Use one common bus for both memory and I/O but separate control lines for each.
iii. Use one common bus for memory and I/O with common control
lines.I/O Processor
In the first method, the computer has independent sets of data, address and control buses
one for accessing memory and other for I/O. This is done in computers that provides a
separate I/O processor (IOP). The purpose of IOP is to provide an independent pathway for
the transfer of information between external device and internal memory.
i. Strobe Control
ii. Handshaking
45
Strobe Signal :
The strobe control method of Asynchronous data transfer employs a single control line to
time each transfer. The strobe may be activated by either the source or the destination unit.
In the block diagram fig. (a), the data bus carries the binary information from source to
destination unit. Typically, the bus has multiple lines to transfer an entire byte or word. The
strobe is a single line that informs the destination unit when a valid data word is available.
The timing diagram fig. (b) the source unit first places the data on the data
bus. The information on the data bus and strobe signal remain in the active state to allow the
destination unit to receive the data.
In this method, the destination unit activates the strobe pulse, to informing the source to
provide the data. The source will respond by placing the requested binary information on the
data bus.
The data must be valid and remain in the bus long enough for the destination
unit to accept it. When accepted the destination unit then disables the strobe and the source
unit removes the data from the bus.
46
Disadvantage of Strobe Signal :
The disadvantage of the strobe method is that, the source unit initiates the transfer has no way
of knowing whether the destination unit has actually received the data item that was places in
the bus. Similarly, a destination unit that initiates the transfer has no way of knowing whether
the source unit has actually placed the data on bus. The Handshaking method solves this
problem.
Handshaking:
The handshaking method solves the problem of strobe method by introducing a second
control signal that provides a reply to the unit that initiates the transfer.
Principle of Handshaking:
The basic principle of the two-wire handshaking method of data transfer is as follow:
One control line is in the same direction as the data flows in the bus from the source to
destination. It is used by source unit to inform the destination unit whether there a valid data
in the bus. The other control line is in the other direction from the destination to the source. It
is used by the destination unit to inform the source whether it can accept the data. The
sequence of control during the transfer depends on the unit that initiates the transfer.
The sequence of events shows four possible states that the system can be at any given time.
The source unit initiates the transfer by placing the data on the bus and enabling its data valid
signal. The data accepted signal is activated by the destination unit after it accepts the data
from the bus. The source unit then disables its data accepted signal and the system goes into
its initial state.
47
Destination Initiated Transfer Using Handshaking:
The name of the signal generated by the destination unit has been changed to ready for data
to reflects its new meaning. The source unit in this case does not place data on the bus until
after it receives the ready for data signal from the destination unit. From there on, the
handshaking procedure follows the same pattern as in the source initiated case.
The only difference between the Source Initiated and the Destination Initiated transfer is in
their choice of Initial sate.
48
Advantage of the Handshaking method:
If any of one unit is faulty, the data transfer will not be completed. Such an error
can be detected by means of a Timeout mechanism which provides an alarm if the
data isnot completed within time.
Parallel transmission is faster but it requires many wires. It is used for short distances and
where speed is important. Serial transmission is slower but is less expensive.
In Asynchronous serial transfer, each bit of message is sent a sequence at a time, and binary
information is transferred only when it is available. When there is no information to be
transferred, line remains idle.
i. Start bit
i. Start Bit- First bit, called start bit is always zero and used to indicate the beginning
character.
ii. Stop Bit- Last bit, called stop bit is always one and used to indicate end of
characters. Stop bit is always in the 1- state and frame the end of the characters
tosignify the idle or wait state.
iii. Character Bit- Bits in between the start bit and the stop bit are known as
characterbits. The character bits always follow the start bit.
It works as both a receiver and a transmitter. Its operation is initialized by CPU by sending a
byte to the control register.
The transmitter register accepts a data byte from CPU through the data bus and
transferred to a shift register for serial transmission.
The receive portion receives information into another shift register, and when a
complete data byte is received it is transferred to receiver register.
CPU can select the receiver register to read the byte through the data bus. Data in the
status register is used for input and output flags.
A First In First Out (FIFO) Buffer is a memory unit that stores information in such a manner
that the first item is in the item first out. A FIFO buffer comes with separate input and output
terminals. The important feature of this buffer is that it can input data and output data at two
different rates.
When placed between two units, the FIFO can accept data from the source unit at one rate,
rate of transfer and deliver the data to the destination unit at another rate.
If the source is faster than the destination, the FIFO is useful for source data arrive in
bursts that fills out the buffer. FIFO is useful in some applications when data are transferred
asynchronously.
Transfer of data is required between CPU and peripherals or memory or sometimes between
any two devices or units of your computer system. To transfer a data from one unit to
another one should be sure that both units have proper connection and at the time of data
transfer the receiving unit is not busy. This data transfer with the computer is Internal
Operation.
All the internal operations in a digital system are synchronized by means of clock pulses
supplied by a common clock pulse Generator. The data transfer can be
i. Synchronous or
ii. Asynchronous
When both the transmitting and receiving units use same clock pulse then such a data transfer
is called Synchronous process. On the other hand, if the there is not concept of clock pulses
50
and the sender operates at different moment than the receiver then such a data transfer is
called Asynchronous data transfer.
The data transfer can be handled by various modes. some of the modes use CPU as an
intermediate path, others transfer the data directly to and from the memory unit and this can
be handled by 3 following ways:
i. Programmed I/O
In this mode of data transfer the operations are the results in I/O instructions which is a
part of computer program. Each data transfer is initiated by a instruction in the program.
Normally the transfer is from a CPU register to peripheral device or vice-versa.
Once the data is initiated the CPU starts monitoring the interface to see when next transfer
can made. The instructions of the program keep close tabs on everything that takes place in
the interface unit and the I/O devices.
51
In this technique CPU is responsible for executing data from the memory for output
and storing data in memory for executing of Programmed I/O as shown in Flowchart-:
The main drawback of the Program Initiated I/O was that the CPU has to monitor the units all
the times when the program is executing. Thus the CPU stays in a program loop until the I/O
unit indicates that it is ready for data transfer. This is a time consuming process and the CPU
time is wasted a lot in keeping an eye to the executing of program.
To remove this problem an Interrupt facility and special commands are used.
Interrupt-Initiated I/O :
In this method an interrupt facility an interrupt command is used to inform the device about
the start and end of transfer. In the meantime the CPU executes other program. When the
interface determines that the device is ready for data transfer it generates an Interrupt Request
and sends it to the computer.
When the CPU receives such an signal, it temporarily stops the execution of the program and
branches to a service program to process the I/O transfer and after completing it returns back
to task, what it was originally performing.
⚫ In this type of IO, computer does not check the flag. It continue to perform its task.
52
⚫ Whenever any device wants the attention, it sends the interrupt signal to the CPU.
⚫ CPU then deviates from what it was doing, store the return address from PC
andbranch to the address of the subroutine.
⚫ Vectored Interrupt
⚫ Non-vectored Interrupt
⚫ In vectored interrupt the source that interrupt the CPU provides the
branchinformation. This information is called interrupt vectored.
Priority Interrupt:
⚫ There are number of IO devices attached to the computer.
⚫ When the interrupt is generated from more than one device, priority interrupt
systemis used to determine which device is to be serviced first.
⚫ Devices with high speed transfer are given higher priority and slow devices are
givenlower priority.
⚫ Using Software
⚫ Using Hardware
Polling Procedure :
⚫ There is one common branch address for all interrupts.
⚫ Branch address contain the code that polls the interrupt sources in sequence.
Thehighest priority is tested first.
⚫ The disadvantage is that time required to poll them can exceed the time to serve
themin large number of IO devices.
Using Hardware:
⚫ To speed up the operation each interrupting devices has its own interrupt vector.
⚫ No polling is required, all decision are established by hardware priority interrupt unit.
⚫ Device that wants the attention send the interrupt request to the CPU.
⚫ CPU then sends the INTACK signal which is applied to PI(priority in) of the first
device.
⚫ If it had requested the attention, it place its VAD(vector address) on the bus. And
itblock the signal by placing 0 in PO(priority out)
⚫ If not it pass the signal to next device through PO(priority out) by placing 1.
⚫ The device whose PI is 1 and PO is 0 is the device that send the interrupt request.
⚫ It consist of interrupt register whose bits are set separately by the interrupting devices.
54
⚫ Mask register is used to provide facility for the higher priority devices to
interruptwhen lower priority device is being serviced or disable all lower priority
devices when higher is being serviced.
⚫ Corresponding interrupt bit and mask bit are ANDed and applied to priority encoder.
55
Direct Memory Access (DMA):
In the Direct Memory Access (DMA) the interface transfer the data into and out of the
memory unit through the memory bus. The transfer of data between a fast storage device such
as magnetic disk and memory is often limited by the speed of the CPU. Removing the CPU
from the path and letting the peripheral device manage the memory buses directly would
improve the speed of transfer. This transfer technique is called Direct Memory Access
(DMA).
During the DMA transfer, the CPU is idle and has no control of the memory buses. A DMA
Controller takes over the buses to manage the transfer directly between the I/O device and
memory.
The CPU may be placed in an idle state in a variety of ways. One common method
extensively used in microprocessor is to disable the buses through special control signals
such as:
These two control signals in the CPU that facilitates the DMA transfer. The Bus Request
(BR) input is used by the DMA controller to request the CPU. When this input is active, the
CPU terminates the execution of the current instruction and places the address bus, data bus
56
and read write lines into a high Impedance state. High Impedance state means that the
outputis disconnected.
The CPU activates the Bus Grant (BG) output to inform the external DMA that the Bus
Request (BR) can now take control of the buses to conduct memory transfer without
processor.
When the DMA terminates the transfer, it disables the Bus Request (BR) line. The CPU
disables the Bus Grant (BG), takes control of the buses and return to its normal operation.
i. DMA Burst
ii) Cycle Stealing :- Cycle stealing allows the DMA controller to transfer one data word
at a time, after which it must returns control of the buses to the CPU.
DMA Controller:
The DMA controller needs the usual circuits of an interface to communicate with the
CPU and I/O device. The DMA controller has three registers:
i. Address Register
ii. Word Count Register :- WC holds the number of words to be transferred. The
register is incre/decre by one after each word transfer and internally tested for
zero.
The unit communicates with the CPU via the data bus and control lines. The
registers in the DMA are selected by the CPU through the address bus by enabling the
DS (DMA select) and RS (Register select) inputs. The RD (read) and WR (write)
inputs are bidirectional.
When the BG (Bus Grant) input is 0, the CPU can communicate
with the DMA registers through the data bus to read from or write to the DMA
registers. When BG =1, the DMA can communicate directly with the memory by
specifying an address in the address bus and activating the RD or WR control.
DMA Transfer:
The CPU communicates with the DMA through the address and data buses as with
any interface unit. The DMA has its own address, which activates the DS and RS
lines. The CPU initializes the DMA through the data bus. Once the DMA receives the
start control command, it can transfer between the peripheral and the memory.
58
When BG = 0 the RD and WR are input lines allowing the CPU to
communicate with the internal DMA registers. When BG=1, the RD and WR are
output lines from the DMA controller to the random access memory to specify the
read or write operation of data.
Summary :
Interface is the point where a connection is made between two different parts
of asystem.
The strobe control method of Asynchronous data transfer employs a single
controlline to time each transfer.
The handshaking method solves the problem of strobe method by introducing
asecond control signal that provides a reply to the unit that initiates the
transfer.
Programmed I/O mode of data transfer the operations are the results in
I/Oinstructions which is a part of computer program.
In the Interrupt Initiated I/O method an interrupt facility an interrupt command is
usedto inform the device about the start and end of transfer.
In the Direct Memory Access (DMA) the interface transfer the data into and out of
thememory unit through the memory bus.
Input-Output Processor:
⚫ IOP is similar to CPU except that it is designed to handle the details of IO operation.
⚫ Unlike DMA which is initialized by CPU, IOP can fetch and execute its own
instructions.
59
⚫ Memory occupies the central position and can communicate with each processor
byDMA.
⚫ IOP provides the path for transfer of data between various peripheral devices
andmemory.
⚫ Data formats of peripherals differ from CPU and memory. IOP maintain such
problems.
⚫ Data are transfer from IOP to memory by stealing one memory cycle.
⚫ Instructions that are read from memory by IOP are called commands to
distinguishthem from instructions that are read by the CPU.
60
MEMORY HEIRARCHY
The Computer memory hierarchy looks like a pyramid structure which is used to describe the differences among memory
types. It separates the computer storage based on hierarchy.
Level 0: CPU registers
Level 1: Cache memory
Level 2: Main memory or primary memory
Level 3: Magnetic disks or secondary memory
Level 4: Optical disks or magnetic types or tertiary Memory
In Memory Hierarchy the cost of memory, capacity is inversely proportional to speed. Here the devices are arranged in a
manner Fast to slow, that is form register to Tertiary memory.
Let us discuss each level in detail:
Level-0 − Registers
The registers are present inside the CPU. As they are present inside the CPU, they have least access time. Registers are
most expensive and smallest in size generally in kilobytes. They are implemented by using Flip-Flops.
Level-1 − Cache
Cache memory is used to store the segments of a program that are frequently accessed by the processor. It is expensive
and smaller in size generally in Megabytes and is implemented by using static RAM.
Level-2 − Primary or Main Memory
It directly communicates with the CPU and with auxiliary memory devices through an I/O processor. Main memory is less
expensive than cache memory and larger in size generally in Gigabytes. This memory is implemented by using dynamic
RAM.
Level-3 − Secondary storage
61
Secondary storage devices like Magnetic Disk are present at level 3. They are used as backup storage. They are
cheaper than main memory and larger in size generally in a few TB.
Level-4 − Tertiary storage
Tertiary storage devices like magnetic tape are present at level 4. They are used to store removable files and are the
cheapest and largest in size (1-20 TB).
Let us see the memory levels in terms of size, access time, bandwidth.
Level Register Cache Primary memory Secondary
memory
Bandwidth 4k to 32k MB/sec 800 to 5k MB/sec 400 to 2k MB/sec 4 to 32 MB/sec
Size Less than 1KB Less than 4MB Less than 2 GB Greater than 2 GB
Access time 2 to 5nsec 3 to 10 nsec 80 to 400 nsec 5ms
Managed by Compiler Hardware Operating system OS or user
Why memory Hierarchy is used in systems?
Memory hierarchy is arranging different kinds of storage present on a computing device based on speed of access. At the
very top, the highest performing storage is CPU registers which are the fastest to read and write to. Next is cache
memory followed by conventional DRAM memory, followed by disk storage with different levels of performance including
SSD, optical and magnetic disk drives.
To bridge the processor memory performance gap, hardware designers are increasingly relying on memory at the top of
the memory hierarchy to close / reduce the performance gap. This is done through increasingly larger cache hierarchies
(which can be accessed by processors much faster), reducing the dependency on main memory which is slower.
Main Memory
The main memory is the fundamental storage unit in a computer system. It is associatively large and quick memory and
saves programs and information during computer operations. The technology that makes the main memory work is based
on semiconductor integrated circuits.
RAM is the main memory. Integrated circuit Random Access Memory (RAM) chips are applicable i n
two possible operating modes are as follows −
Static − It consists of internal flip-flops, which store the binary information. The stored data remains
solid considering power is provided to the unit. The static RAM is simple to use and has smaller read
and write cycles.
Dynamic − It saves the binary data in the structure of electric charges that are used to capacitors. The
capacitors are made available inside the chip by Metal Oxide Semiconductor (MOS) transistors. The
stored value on the capacitors contributes to discharge with time and thus, the capacitors should be
regularly recharged through stimulating the dynamic memory.
62
Read-Only Memory
In each computer system, there should be a segment of memory that is fixed and unaffected by
power failure. This type of memory is known as Read-Only Memory or ROM.
SRAM
RAMs that are made up of circuits and can preserve the information as long as power is supplied are
referred to as Static Random Access Memories (SRAM). Flip-flops form the basic memory elements
in an SRAM device. An SRAM consists of an array of flip-flops, one for each bit. SRAM consists of
an array of flip-flops, a large number of flip-flops are needed to provide higher capacity memory.
Because of this, simpler flip-flop circuits, BJT, and MOS transistors are used for SRAM.
DRAM
SRAMs are faster but their cost is high because their cells require many transistors. RAMs can be
obtained at a lower cost if simpler cells are used. A MOS storage cell based on capacitors can be
used to replace the SRAM cells. Such a storage cell cannot preserve the charge (that is, data)
indefinitely and must be recharged periodically. Therefore, these cells are called dynamic storage
cells. RAMs using these cells are referred to as Dynamic RAMs or simply DRAMs.
Auxiliary Memory
An Auxiliary memory is referred to as the lowest-cost, highest-space, and slowest-approach storage in a computer
system. It is where programs and information are preserved for long-term storage or when not in direct use. The most
typical auxiliary memory devices used in computer systems are magnetic disks and tapes.
Magnetic Disks
A magnetic disk is a round plate generated of metal or plastic coated with magnetized material. There are both sides of
the disk are used and multiple disks can be stacked on one spindle with read/write heads accessible on each surface.
All disks revolve together at high speed and are not stopped or initiated for access purposes. Bits are saved in the
magnetized surface in marks along concentric circles known as tracks. The tracks are frequently divided into areas
known as sectors.
In this system, the lowest quantity of data that can be sent is a sector. The subdivision of one disk surface into tracks and
sectors is displayed in the figure.
63
Magnetic Tape
Magnetic tape transport includes the robotic, mechanical, and electronic components to support the methods and control
structure for a magnetic tape unit. The tape is a layer of plastic coated with a magnetic documentation medium.
Bits are listed as a magnetic stain on the tape along various tracks. There are seven or nine bits are recorded together to
form a character together with a parity bit. Read/write heads are mounted one in each track therefore that information can
be recorded and read as a series of characters.
Magnetic tape units can be stopped, initiated to move forward, or in the opposite, or it can be reversed. However, they
cannot be initiated or stopped fast enough between single characters. For this reason, data is recorded in blocks defined
as records. Gaps of unrecorded tape are added between records where the tape can be stopped.
The tape begins affecting while in a gap and achieves its permanent speed by the time it arrives at the next record. Each
record on tape has a recognition bit design at the starting and end. By reading the bit design at the starting, the tape
control recognizes the data number.
ASSOCIATIVE MEMORY/
64
On the other hand, when the word is to be read from an associative memory, the
content of the word, or part of the word, is specified. The words which match the
specified content are located by the memory and are marked for reading .
Advantages of Associative memory :- Disadvantages of Associative memory :-
1. It is used where search time needs to be 1. It is more expensive than RAM.
less or short. 2. Each cell must have storage capability and
2. It is suitable for parallel searches. logical circuits for matching its content with
3. It is often used to speedup databases. external argument.
65
Cache Memory
Cache Performance: When the processor needs to read or write a location in main
memory, it first checks for a corresponding entry in the cache.
If the processor finds that the memory location is in the cache, a cache hit has occurred
and data is read from the cache.
If the processor does not find the memory location in the cache, a cache miss has
occurred. For a cache miss, the cache allocates a new entry and copies in data from
main memory, then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit
ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, and higher
associativity, reduce miss rate, reduce miss penalty, and reduce the time to hit in the
cache.
66
Virtual Memory
Virtual memory is the partition of logical memory from physical memory. This partition supports large virtual memory for
programmers when only limited physical memory is available.
Virtual memory can give programmers the deception that they have a very high memory although the computer has a
small main memory. It creates the function of programming easier because the programmer no longer requires to worry
about the multiple physical memory available.
Virtual memory works similarly, but at one level up in the memory hierarchy. A memory management unit (MMU)
transfers data between physical memory and some gradual storage device, generally a disk. This storage area can be
defined as a swap disk or swap file, based on its execution. Retrieving data from physical memory is much faster than
accessing data from the swap disk.
There are two primary methods for implementing virtual memory are as follows −
Paging
Paging is a technique of memory management where small fixed-length pages are allocated instead of a single large
variable-length contiguous block in the case of the dynamic allocation technique. In a paged system, each process is
divided into several fixed-size ‘chunks’ called pages, typically 4k bytes in length. The memory space is also divided into
blocks of the equal size known as frames.
Advantages of Paging
There are the following advantages of Paging are −
In Paging, there is no requirement for external fragmentation.
In Paging, the swapping among equal-size pages and page frames is clear.
Paging is a simple approach that it can use for memory management.
Disadvantage of Paging
There are the following disadvantages of Paging are −
In Paging, there can be a chance of Internal Fragmentation.
In Paging, the page table employs more memory.
Because of Multi-level Paging, there can be a chance of memory reference overhead.
Segmentation
67
UNIT-V
Parallel Processing
Parallel processing can be described as a class of techniques which enables the system to
achieve simultaneous data-processing tasks to increase the computational speed of a computer
system.
A parallel processing system can carry out simultaneous data-processing to achieve faster execution
time. For instance, while an instruction is being processed in the ALU component of the CPU, the next
instruction can be read from memory.
The primary purpose of parallel processing is to enhance the computer processing capability and
increase its throughput, i.e. the amount of processing that can be accomplished during a given
interval of time.
A parallel processing system can be achieved by having a multiplicity of functional units that perform
identical or different operations simultaneously. The data can be distributed among various multiple
functional units.
The following diagram shows one possible way of separating the execution unit into eight functional
units operating in parallel.
The operation performed in each functional unit is indicated in each block if the diagram:
o The adder and integer multiplier performs the arithmetic operation with integer numbers.
o The floating-point operations are separated into three circuits operating in parallel.
68
o The logic, shift, and increment operations can be performed concurrently on different data. All
units are independent of each other, so one number can be shifted while another number is
being incremented.
Amdahl’s law
It is named after computer scientist Gene Amdahl( a computer architect from IBM and
Amdahl corporation) and was presented at the AFIPS Spring Joint Computer Conference in
1967. It is also known as Amdahl’s argument.
It is a formula that gives the theoretical speedup in latency of the execution of a task at a
fixed workload that can be expected of a system whose resources are improved. In other
words, it is a formula used to find the maximum improvement possible by just improving a
particular part of a system. It is often used in parallel computing to predict the theoretical
speedup when using multiple processors.
Speedup- Speedup is defined as the ratio of performance for the entire task using the
enhancement and performance for the entire task without using the enhancement or
speedup can be defined as the ratio of execution time for the entire task without using the
enhancement and execution time for the entire task using the enhancement. If Pe is the
performance for the entire task using the enhancement when possible, Pw is the
performance for the entire task without using the enhancement, Ew is the execution time for
the entire task without using the enhancement and Ee is the execution time for the entire
task using the enhancement when possible then,
Speedup = Pe/Pw or Speedup = Ew/Ee
Amdahl’s law uses two factors to find speedup from some enhancement:
Fraction enhanced – The fraction of the computation time in the original computer that
can be converted to take advantage of the enhancement. For example- if 10 seconds of
the execution time of a program that takes 40 seconds in total can use an enhancement,
the fraction is 10/40. This obtained value is Fraction Enhanced. Fraction enhanced is
always less than 1.
Speedup enhanced – The improvement gained by the enhanced execution mode; that
is, how much faster the task would run if the enhanced mode were used for the entire
program. For example – If the enhanced mode takes, say 3 seconds for a portion of the
program, while it is 6 seconds in the original mode, the improvement is 6/3. This value is
Speedup enhanced. Speedup Enhanced is always greater than 1.
69
The formula for Amdahl’s law is:
S = 1 / (1 – P + (P / N))
Where:
S is the speedup of the system
P is the proportion of the system that can be improved
N is the number of processors in the system
For example, if a system has a single bottleneck that occupies 20% of the total execution
time, and we add 4 more processors to the system, the speedup would be:
S = 1 / (1 – 0.2 + (0.2 / 5))
S = 1 / (0.8 + 0.04)
S = 1 / 0.84
S = 1.19
This means that the overall performance of the system would improve by about 19% with
the addition of the 4 processors.
Pipelining
The term Pipelining refers to a technique of decomposing a sequential process into sub-operations,
with each sub-operation being executed in a dedicated segment that operates concurrently with all
other segments.
The most important characteristic of a pipeline technique is that several computations can be in
progress in distinct segments at the same time. The overlapping of computation is made possible by
associating a register with each segment in the pipeline. The registers provide isolation between each
segment so that each can operate on distinct data simultaneously.
The structure of a pipeline organization can be represented simply by including an input register for
each segment followed by a combinational circuit.
Let us consider an example of combined multiplication and addition operation to get a better
understanding of the pipeline organization.
The combined multiplication and addition operation is done with a stream of numbers such as:
The operation to be performed on the numbers is decomposed into sub-operations with each sub-
operation to be implemented in a segment within a pipeline.
The sub-operations performed in each segment of the pipeline are defined as:
70
The following block diagram represents the combined as well as the sub-operations performed in
each segment of the pipeline.
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a particular
segment.
The output generated by the combinational circuit in a given segment is applied as an input register
of the next segment. For instance, from the block diagram, we can see that the register R3 is used as
one of the input registers for the combinational adder circuit.
In general, the pipeline organization is applicable for two areas of computer design which includes:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
71
Arithmetic Pipelines are mostly used in high-speed computers. They are used to implement floating-
point operations, multiplication of fixed-point numbers, and similar computations encountered in
scientific problems.
To understand the concepts of arithmetic pipeline in a more convenient way, let us consider an
example of a pipeline unit for floating-point addition and subtraction.
The inputs to the floating-point adder pipeline are two normalized floating-point binary numbers
defined as:
X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102
Where A and B are two fractions that represent the mantissa and a and b are the exponents.
The combined operation of floating-point addition and subtraction is divided into four segments.
Each segment contains the corresponding suboperation to be performed in the given pipeline. The
suboperations that are shown in the four segments are:
We will discuss each suboperation in a more detailed manner later in this section.
The following block diagram represents the suboperations performed in each segment of the
pipeline.
72
1. Compare
exponents by subtraction:
The exponents are compared by subtracting them to determine their difference. The larger exponent
is chosen as the exponent of the result.
73
The difference of the exponents, i.e., 3 - 2 = 1 determines how many times the mantissa associated
with the smaller exponent must be shifted to the right.
X = 0.9504 * 103
Y = 0.08200 * 103
3. Add mantissas:
The two mantissas are added in segment three.
Z = X + Y = 1.0324 * 10 3
Z = 0.1324 * 104
Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream as well.
Most of the digital computers with complex instructions require instruction pipeline to carry out
operations like fetch, decode and execute instructions.
In general, the computer needs to process each instruction with the following sequence of steps.
Each step is executed in a particular segment, and there are times when different segments may take
different times to operate on the incoming information. Moreover, there are times when two or more
segments may require memory access at the same time, causing one segment to wait until another is
finished with the memory.
74
The organization of an instruction pipeline will be more efficient if the instruction cycle is divided into
segments of equal duration. One of the most common examples of this type of organization is
a Four-segment instruction pipeline.
A four-segment instruction pipeline combines two or more different segments and makes it as a
single one. For instance, the decoding of the instruction can be combined with the calculation of the
effective address into one segment.
The following block diagram shows a typical example of a four-segment instruction pipeline. The
instruction cycle is completed in four segments.
Segment 1:
The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.
75
Segment 2:
The instruction fetched from memory is decoded in the second segment, and eventually, the effective
address is calculated in a separate arithmetic circuit.
Segment 3:
Segment 4:
The instructions are finally executed in the last segment of the pipeline organization.
The operations performed on the data in the processor constitute a data stream.
Flynn's classification divides computers into four major groups that are:
1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)
76
SISD
SISD stands for 'Single Instruction and Single Data Stream'. It represents the organization of a
single computer containing a control unit, a processor unit, and a memory unit.
Instructions are executed sequentially, and the system may or may not have internal parallel
processing capabilities.
Most conventional computers have SISD architecture like the traditional Von-Neumann computers.
Parallel processing, in this case, may be achieved by means of multiple functional units or by pipeline
processing.
Instructions are decoded by the Control Unit and then the Control Unit sends the instructions to the
processing units for execution.
Examples:
SIMD
77
SIMD stands for 'Single Instruction and Multiple Data Stream'. It represents an organization that
includes many processing units under the supervision of a common control unit.
All processors receive the same instruction from the control unit but operate on different items of
data.
The shared memory unit must contain multiple modules so that it can communicate with all the
processors simultaneously.
SIMD is mainly dedicated to array processing machines. However, vector processors can also be seen as a part
of this group.
MISD
MISD stands for 'Multiple Instruction and Single Data stream'.
MISD structure is only of theoretical interest since no practical system has been constructed using this
organization.
In MISD, multiple processing units operate on one single-data stream. Each processing unit operates
on the data independently via separate instruction stream.
78
1. Where, M = Memory Modules, CU = Control Unit, P = Processor Units
The experimental Carnegie-Mellon [Link] computer (1971)
MIMD
MIMD stands for 'Multiple Instruction and Multiple Data Stream'.
In this organization, all processors in a parallel computer can execute different instructions and
operate on various data at the same time.
In MIMD, each processor has a separate program and an instruction stream is generated from each
program.
79