Introduction To Computer Organization

15CSE301:Computer Organization and
Architecture
Amritha School of Engineering
July 12, 2019

Course Contents
Outline
Introduction Computer Organization
Instruction Set Architecture (ISA)
MIPS Architecture & Instruction Set
Introduction to SPIM
Data & Control Path Design
Floating point arithmetic and Data Path
Data Path for Arith/LD/ST/CTR instruction
Control Path for Single Cycle CPU
Multi Cycle CPU Design
Pipelining
Hazards; Data, Control and Structural
Branch Prediction; Static and Dynamic
Memory Hiearchy design
Cache Memory Organization
Main Memeory Interleaving
Books and References
I John L Hennessy and David A Patterson, Computer
Organization and Design, Fifth Edition. Morgan Kaufman,
2013.
II John L Hennessy and David A Patterson, Computer
Architecture, A Quantitative Approach, Fourth Edition.
ELSEVIER, 2003. processor Design, Narosa,
Online
https://www.edx.org/course/computation-structures-2-
computer-architecture
https://www.edx.org/course/computation-structures-3-
computer-architecture
https://nptel.ac.in/courses/106106092/18
Evaluation
Internal Evaluation
Two written tests- 25 marks
Written Assignments-10 marks
Programming Assignments- 10 Marks
Quiz, Viva, Class Performance- 5 Marks
End Semester Exam

Written Test (Closed Book)- 50 Marks
Oraganization of a Computer I
Five Components
Five classic components of a computer input, output,
memory, datapath, and control
datapath + control = processor
Processor + Memory + I/O Devices = computer
Oraganization of a Computer II
Components
Components:
input (mouse, keyboard, camera, microphone...)
output (display, printer, speakers....)
memory (caches, DRAM, SRAM, hard disk drives, Flash....)
network (both input and output)
Our primary focus: the processor (datapath and control)
implemented using billions of transistors
Impossible to understand by looking at each transistor
We need...abstraction!
Oraganization of a Computer III
Abstraction in Computer
Abstraction is the act of representing essential features without
including the background details or explanations
Each of the following abstracts everything below it:
Applications software
Systems software
Assembly Language
Machine Language
Architectural Approaches: Caches, Virtual Memory, Pipelining
Sequential logic, finite state machines
Combinational logic, arithmetic circuits
Boolean logic, 1s and 0s
Transistors used to build logic gates (e.g. CMOS)
Semiconductors/Silicon used to build transistors
Properties of atoms, electrons, and quantum dynamics
Notice how abstraction hides the detail of lower levels, yet
gives a useful view for a given purpose
Oraganization of a Computer IV
Assembly-, Machine-, and High-Level Languages
Hiearchy of Languages
Assembly and Machine Language
Languages
Machine language
Native to a processor: executed directly by hardware
Instructions consist of binary code: 1s and 0s
Assembly language
Slightly higher-level language
Readability of instructions is better than machine language
One-to-one correspondence with machine language instructions
Assemblers translate assembly to machine code
Compilers translate high-level programs to machine code
Either directly, or Indirectly via an assembler
Compiler and Assembler
Transalation Process
Compiler and Assembler
Transalating Language
Instruction Set Architecture I
ISA
What is Computer
Architecture?
Computer Architecture =
Instruction Set Architecture
+ Machine Organization
Instruction Set Architecture II
ISA
A very important abstraction
interface between hardware and low-level software
standardizes instructions, machine language bit patterns, etc.
advantage: different implementations of the same architecture
disadvantage: sometimes prevents using new innovations
Common instruction set architectures:
IA-64, IA-32, PowerPC, MIPS, SPARC, ARM, and others
All are multi-sourced, with different implementations for the
same ISA
Performance Measures
Basics
Performance is determined by execution time
Do any of these other variables equal performance?
# of cycles to execute program?
# of instructions in program?
# of cycles per second?
average # of cycles per instruction?
average # of instructions per second?
Performance Measures
Metrics
Clock cycle time = 1 / clock speed
CPU time = clock cycle time x cycles per instruction x
number of instructions
Influencing factors for each:
clock cycle time: technology and pipeline
CPI: architecture and instruction set design
instruction count: instruction set design and compiler
CPI (cycles per instruction) or IPC (instructions per cycle)
can not be accurately estimated analytically
Execution Time
CPU execution time = Instruction count × average CPI × Clock
cycle time
Performance Metrics
Speedup
Speedup is a ratio = old exec time / new exec time
Improvement, Increase, Decrease usually refer to percentage
relative to the baseline
= (new perf old perf) / old perf
Example
A program ran in 100 seconds on my old laptop and in 70
seconds on my new laptop
What is the speedup?
(1/70) / (1/100) = 1.42
What is the percentage increase in performance?
( 1/70 - 1/100 ) / (1/100) = 42%
What is the reduction in execution time?
30%
CPU Performance: Problem
Example
My new laptop has an IPC that is 20% worse than my old
laptop. It has a clock speed that is 30% higher than the old
laptop. Im running the same binaries on both machines.
What speedup is my new laptop providing?
Solution
Exec time = cycle time * CPI * instrs
Perf = clock speed * IPC / instrs
Speedup = new perf / old perf
= new clock speed * new IPC / old clock speed * old IPC
= 1.3 * 0.8 = 1.04
Power-Energy of CPU
Problem
If processor A consumes 1.4x the power of processor B, but
finishes the task in 20% less time, which processor would you
pick:
1 if you were constrained by power delivery constraints?
2 if you were trying to minimize energy per operation?
3 if you were trying to minimize response times?
Solution:
Power-Energy of CPU
Problem
If processor A consumes 1.4x the power of processor B, but
finishes the task in 20% less time, which processor would you
pick:
1 if you were constrained by power delivery constraints?
2 if you were trying to minimize energy per operation?
3 if you were trying to minimize response times?
Solution:
1 Proc-B
2 Proc-A is 1.4x0.8 = 1.12 times the energy of Proc-B
3 Proc-A is faster, but we could scale up the frequency (and
power) of Proc-B and match Proc-As response time (while still
doing better in terms of power and energy)
Problem
Execution Time
1 Given the following parameters, answer each of the following two
questions for the program P1 on machine M1

Instruction Count = 26,395, Average CPI = 2.17, Clock Rate = 3.24
GHz
1 Calculate the execution time ∆texec for P1 on M1 (show all work):
2 If M1 runs program P2 2.9 times faster than P1 with clock rate and
CPI remaining the same as above, then what variable in the
performance equation changed, and by how much?
Solution
1 ∆texec = IC . CPI . ∆tclock = IC . CPI . (clock rate)−1
= 26,395 instr . 2.17 cycles/instr . (3.24 Gcycles/sec)−1
≈ 17,678 . 10-9 sec ≈ 17.7 mic
2 If CPI and clock rate are unchanged, then the remaining variable is
IC. For P2 to run 2.9 times faster, the runtime must be 1 / 2.9 ≈
0.345 that of P1. This means that IC of P2 must be 0.345 times the
IC of P1, so we have IC(P2)≈ 0.345 . IC(P1) = 0.345 . 26,395 9,102
instructions
Amdahl’s Law
Problem
Suppose a program runs in 100 seconds on a computer, with multiply
operations responsible for 80 seconds of this time. How much do I
have to improve the speed of multiplication if I want my program
to run 2 times faster?
Solution
Execution time after improvement =
Execution time affected by improvement
Amount of improvement + Execution time unaffected
Let n be the amount of improvement
50 = 80 n + (100-80)
n ≈ 2.6
Amdahl’s Law: Speedup Calculation
SpeedUp
Execution time after improvement =
Execution time affected by improvement
Amount of improvement + Execution time unaffected
Speedup =
Execution time before
Execution time affected
Execution time before−Execution time affected+ Amount of Improvement
1
Speedup = Fraction time affected
1−Fraction time affected+ Amount of improvement
1
Speedup = Fraction
1−FractionEnhanced + SpeedupEnhanced
Enhanced
Problem 1
A new processor is 10 times faster on serving a web application
than the old one. Assume original processor is busy in 40% of
computation and waiting I/O in 60%. What is the overall speed?
Problem 1
A new processor is 10 times faster on serving a web application
than the old one. Assume original processor is busy in 40% of
computation and waiting I/O in 60%. What is the overall speed?
Solution
FractionEnhanced = 0.4, SpeedupEnhanced = 10
1
Speedup = 1−0.4+ 0.4 = 1.56
10
Amdahl’s Law
Problem 2
Bob is given the job to write a program that will get a speedup of 3.8
on 4 processors. He makes it 95% parallel, and goes home dreaming
of a big pay raise. Using Amdahls law, and assuming the problem
size is the same as the serial version, and ignoring communication
costs, what speedup will Bob actually get?
Amdahl’s Law
Problem 2
Bob is given the job to write a program that will get a speedup of 3.8
on 4 processors. He makes it 95% parallel, and goes home dreaming
of a big pay raise. Using Amdahls law, and assuming the problem
size is the same as the serial version, and ignoring communication
costs, what speedup will Bob actually get?
Solution
FractionEnhanced = 0.95, FractionEnhanced = 4
1
Speedup = 1−0.95+ 0.95 = 3.47
4
Amdahl’s Law
Problem 3
Mary has a problem whose size can increase with an increasing num-
ber of processors. She executes the program and determines that in
a parallel execution on 100 processors. 5% of the time is spent in
the sequential part of the program. What is the scaled speedup of
the program on 100 processors?
Amdahl’s Law
Problem 3
Mary has a problem whose size can increase with an increasing num-
ber of processors. She executes the program and determines that in
a parallel execution on 100 processors. 5% of the time is spent in
the sequential part of the program. What is the scaled speedup of
the program on 100 processors?
Solution
FractionEnhanced = 0.95, FractionEnhanced = 100
1
Speedup = 1−0.95+ 0.95 = 19
100
Average CPI
Calculation
Performance Equation:
CPU Time = Cycle time x Instruction Count x Average CPI
Assuming n different type of instructions, each with count ICi
and requiring CPIi cycles:
CPU Time = Cycle time x Σni=1 (ICi x CPIi )
Then: Pn
i=1 (ICi x CPIi )
Average
Pn CPI = IC
= i=1 CPIi x Fi ,
where Fi is the frequency of instruction type i
Performance with multiple type instructions
Problem
A computer M has the following CPIs for instruction types A
thru D, and a program P has the following mix of instructions
(Note: pct = percent): M 2 : Type A CPI (A) = 1.7 Type B
CPI (B) = 2.1, Type C CPI (C) = 2.7 Type D CPI (D) = 2.4
P 3 : Type A = 22 pct, Type B = 29 pct, Type C = 17 pct,
Type D = remaining pct
Calculate
Pn the average CPI of Machine M:
CPI= i=1 CPIi x Fi ,
= 1.7(0.22) + 2.1(0.29) + 2.7(0.17) + 2.4(0.32)
= 2.21
Calculate the runtime of P on M if IC = 22,311 and clock rate
is 3.3 GHz:
CPU time = IC . CPI . (clock rate) −1
= 22,311 . 2.21 . (3.3 Gcycles/sec) −1
=14.9 µs
MIPS Architecture
MIPS Basics
MIPS: Microprocessor
without Interlocked Pipeline
Stages. Well be working with
the MIPS instruction set
architecture
similar to other
architectures developed
since the 1980’s
Almost 100 million MIPS
processors manufactured in
2002
used by NEC, Nintendo,
Cisco, Silicon Graphics,
Sony,
MIPS Design Principles
Simplicity Favors Regularity

Keep all instructions a single size
Always require three register operands in arithmetic
instructions
Smaller is Faster
Has only 32 registers rater than many more
Good Design Makes Good Compromises
Comprise between providing larger addresses and constants
instruction and keeping instruction the same length
Make the Common Case Fast
PC-relative addressing for conditional branches
Immediate addressing for constant operands
MIPS Registers & Memory
MIPS Registers & Use
MIPS Memory Organization
MIPS Instruction Format
MIPS ALU Instructions
Used for arithmetic, logical, shift instructions

op: Basic operation of the instruction (opcode)
rs: first register source operand
rt: second register source operand
rd: register destination operand
shamt: shift amount (more about this later)
funct: function - specific type of operation
Also called R-Format or R-Type Instructions
Instruction usage (assembly)

add dest, src1, src2 dest=src1 + src2
sub dest, src1, src2 dest=src1 - src2
and dest, src1, src2 dest=src1 AND src2
MIPS Data Transfer Instructions
Transfer data between registers and memory
Instruction format (assembly)
lw $dest, offset($addr) load word
sw $src, offset($addr) store word
Uses:
Accessing a variable in main memory
Accessing an array element
R and I Format
Format Examples
I Format
lw $9, 1200($8) == lw $t1, 1200($t0)
R Format
add $8, $8, $9
MIPS Instruction Types
Arithmetic & Logical - manipulate data in registers

add $s1, $s2, $s3 $s1 = $s2 + $s3
or $s3, $s4, $s5 $s3 = $s4 OR $s5
Data Transfer - move register data to/from memory
lw $s1, 100($s2) $s1 = Memory[$s2 + 100]
sw $s1, 100($s2) Memory[$s2 + 100] = $s1
Branch - alter program flow
beq $s1, $s2, 25 if ($s1==$s1) PC = PC + 4 + 4*25
ALU Instructions
Example
Machine language for
add $8, $17, $18
See reference card for op, funct values
Data Transfer I
Loading a Simple Variable

Data Transfer II
Array Variable
Data Transfer Instructions - Binary Representation
Used for load, store instructions

op: Basic operation of the instruction (opcode)
rs: first register source operand
rt: second register source operand
offset: 16-bit signed address offset (-32,768 to +32,767)
Also called I-Format or I-Type instructions
MIPS Examples I
Transalate to MIPS code

Translate the following Java statement into MIPS assembly code.
Assume that x, y, z, q are stored in registers $s1-$s4. You may use
the other registers to hold intermediate results.
x = x + y + z - q;
Solution
add $t0,$s1,$s2
add $t0,$t0,$s3
sub $s1,$t0,$s4
MIPS Examples II
Transalate to MIPS code

Write equivalent MIPS program for the C code i=N*N+3*N
MIPS Code
lw $t0, 4($gp) # fetch N
mult $t0, $t0, $t0 # N*N
lw $t1, 4($gp) # fetch N
ori $t2, $zero, 3 # 3
mult $t1, $t1, $t2 # 3*N
add $t2, $t0, $t1 # N*N + 3*N
sw $t2, 0($gp) # i = ...
A Simple ALP I
Arithmetic operations
A simple take two numbers from the user and perform basic
arithmetic functions such as addition, subtraction and
multiplication with them
Program Flow
1 Print statements to ask the user to enter the two different
numbers
2 Store the two numbers in different registers and print the
menu of arithmetic instructions to the user
3 Based on the choice made by the user, create branch
structures to perform the commands.
4 Print the result and Exit
A Simple ALP II
Step 1- text for interaction

.data
prompt1: .asciiz "Enter the first number:"
prompt2: .asciiz "Enter the second number:"
menu: .asciiz "Enter the number: 1 => add,
2 => subtract or 3 => multiply:"
resultText: .asciiz "Your final result is: "
Store the user’s choice

text
.globl main
main:
li $t3, 1 #1 into the temporary register $t3
A Simple ALP III
Step 1- Prompt the user to get the first value
li $v0, 4 #command for printing a string
la $a0,prompt1 #loading the string to print
syscall #executing the command
Step 1- Get the first value and save

li $v0, 5 #command for reading an integer
move $t0, $v0 #moving the number read to $t0
Step 1- Prompt the user to get the second value

la $a0,prompt2 #loading the string to print
A Simple ALP IV
Step 1- Get the second value and save

move $t1, $v0 #moving the number read to $t1
• Step 1 Completes
Step 2- Print the menu

la $a0, menu #loading the string for printing
A Simple ALP V
Step 2- Get the user’s choice

move $t2, $v0 # move the choice to $t2
•Step 2 completes
Step 2- Branch for different operations

beq $t2,$t3,addProcess #’addProcess’if $t2 = $t3
beq $t2,$t4,subProcess #’subtractProcess’if $t2 = $t4
beq $t2,$t5,mulProcess #’multiplyProcess’if $t2 = $t5
A Simple ALP VI
Step 3- Addition
addProcess:add $t6,$t0,$t1 # $t6=$t0+$t1
J DisplayResult # Jump to display
Step 3- Subtraction
subProcess:sub $t6,$t0,$t1 # $t6=$t0-$t1
Step 3- Multiplication
addProcess:mul $t6,$t0,$t1 # $t6=$t0*$t1
A Simple ALP VII
Step 3- Display results and exit

li $v0,4 # for printing a string
la $a0,resultText #loads the string to print
syscall #executes the command
# Print the result
li $v0,1
la $a0, ($t6)
syscall
li $v0,10 #This is to terminate the program
MIPS Program- Arrays I
Array Access
Getting the data from an array cell, e.g, x = list[i];
Storing data into an array cell, e.g. list[i] = x;
Determining the length of an array, i.e. list.length.
MIPS Program- Arrays II
Array as Byte or Word

Declaration:
vowels: .byte ’a’, ’e’, ’i’, ’o’, ’u’

list: .word 3, 0, 1, 2, 6, -2, 4, 7, 3, 7
To Access array:
la $t3, list # put address of list into $t3

lw $t4, 0($t3) # get the value from the array cell
sw $t2, 12($t3) # store the value into the array cell
Address Calculation:
Address of vowels[k] == vowels + k
Address of list[k] == list + 4 * k
MIPS Program- Arrays III
Sum of the Array

Program Flow
Declare and initialize the array and result message in data
dection
Initialize the counter and sum
Start a loop, get the first no, add to sum, increment counter,
check loop
Display the result
Step1:Initialize the array

data
list: .word 3, 2, 1, 0, 1, 2
result: .asciiz "\n The sum of the array is:""
MIPS Program- Arrays IV
Step2: Initialize Sum, Counter and Index

.text
.globl main
main:
li $s0, 0 # Counter
li $a0, 0 # Sum
li $t0, 0 # Index
MIPS Program- Arrays V
Step3: Loop for addition

forsum:
bge $s0, 6, end_forsum
lw $t1,list($t0) # Load the number from array
add $a0, $a0, $t1 # Compute the sum
move $t2, $a0 # Copy of the sum
addi $t0,$t0,4 # Increment index
addi $s0,$s0,1 #Increment counter
j forsum
end_forsum:
MIPS Program- Arrays VI
Step4: Display the result

li $v0, 4
la $a0, result
syscall
move $a0, $t2 # Move back the result

li $v0,1
syscall # Print sum
li $v0,10 # Terminate program
syscall
Largest of Array I
Program Flow
1 Declare and initialize array, array-size
2 First iteration
Set the counter<– array-size
get the address of the array to a register
get the first value and set as max
increment pointer
decrement count
3 Get into loop
Get next number
Compare with max
replace max if a new max is found
increment pointer, decrement count
4 Display the result
Largest of Array II
Declare and initialize array

.data
array: .word 8,2,31,81,12,10
size: .word 6
max: .word 0
Largest of Array III
Do first iteration
.text
.globl main
main:
jal start
start:
li $t3, 6 #size
la $t1, array # get array address
lw $s5, ($t1) # set max, $s5 to array[0]
add $t1, $t1, 4 # skip array[0]
add $t3, $t3, -1 # len = len - 1
Largest of Array IV
Go to loop
loop:
lw $t4, ($t1) # get n of array[n]
ble $t4,$s5,L1 #if t4 is not less than t5 we got a new max
lw $s5, 0($t1) #get element in array <-- Pretty sure im do
#sw $s5,0($t1) #max
L1: add $t3, $t3, -1 #counter-1
addi $t1, $t1, 4 # advance array pointer
bnez $t3, loop #if not 0 then go on and loop
Largest of Array V
Display Result & Exit

sw $s5, max #printing the max val of the array
lw $a0, max
li $v0, 1
syscall
li $v0, 10

Introduction To Computer Organization

Uploaded by

Introduction To Computer Organization

Uploaded by

15CSE301:Computer Organization and

Amritha School of Engineering

July 12, 2019

End Semester Exam

questions for the program P1 on machine M1

Simplicity Favors Regularity

Used for arithmetic, logical, shift instructions

Instruction usage (assembly)

Arithmetic & Logical - manipulate data in registers

Loading a Simple Variable

Used for load, store instructions

Transalate to MIPS code

Transalate to MIPS code

Step 1- text for interaction

Store the user’s choice

Step 1- Get the first value and save

Step 1- Prompt the user to get the second value

Step 1- Get the second value and save

Step 2- Print the menu

Step 2- Get the user’s choice

Step 2- Branch for different operations

Step 3- Display results and exit

Array as Byte or Word

vowels: .byte ’a’, ’e’, ’i’, ’o’, ’u’

la $t3, list # put address of list into $t3

Sum of the Array

Step1:Initialize the array

Step2: Initialize Sum, Counter and Index

Step3: Loop for addition

Step4: Display the result

move $a0, $t2 # Move back the result

Declare and initialize array

Display Result & Exit

You might also like