Introduction To Computer Organization
Introduction To Computer Organization
Architecture
Outline
Introduction Computer Organization
Instruction Set Architecture (ISA)
MIPS Architecture & Instruction Set
Introduction to SPIM
Data & Control Path Design
Floating point arithmetic and Data Path
Data Path for Arith/LD/ST/CTR instruction
Control Path for Single Cycle CPU
Multi Cycle CPU Design
Pipelining
Hazards; Data, Control and Structural
Branch Prediction; Static and Dynamic
Memory Hiearchy design
Cache Memory Organization
Main Memeory Interleaving
Books and References
I John L Hennessy and David A Patterson, Computer
Organization and Design, Fifth Edition. Morgan Kaufman,
2013.
II John L Hennessy and David A Patterson, Computer
Architecture, A Quantitative Approach, Fourth Edition.
ELSEVIER, 2003. processor Design, Narosa,
Online
https://www.edx.org/course/computation-structures-2-
computer-architecture
https://www.edx.org/course/computation-structures-3-
computer-architecture
https://nptel.ac.in/courses/106106092/18
Evaluation
Internal Evaluation
Two written tests- 25 marks
Written Assignments-10 marks
Programming Assignments- 10 Marks
Quiz, Viva, Class Performance- 5 Marks
Five Components
Five classic components of a computer input, output,
memory, datapath, and control
datapath + control = processor
Processor + Memory + I/O Devices = computer
Oraganization of a Computer II
Components
Components:
input (mouse, keyboard, camera, microphone...)
output (display, printer, speakers....)
memory (caches, DRAM, SRAM, hard disk drives, Flash....)
network (both input and output)
Our primary focus: the processor (datapath and control)
implemented using billions of transistors
Impossible to understand by looking at each transistor
We need...abstraction!
Oraganization of a Computer III
Abstraction in Computer
Abstraction is the act of representing essential features without
including the background details or explanations
Each of the following abstracts everything below it:
Applications software
Systems software
Assembly Language
Machine Language
Architectural Approaches: Caches, Virtual Memory, Pipelining
Sequential logic, finite state machines
Combinational logic, arithmetic circuits
Boolean logic, 1s and 0s
Transistors used to build logic gates (e.g. CMOS)
Semiconductors/Silicon used to build transistors
Properties of atoms, electrons, and quantum dynamics
Notice how abstraction hides the detail of lower levels, yet
gives a useful view for a given purpose
Oraganization of a Computer IV
Assembly-, Machine-, and High-Level Languages
Hiearchy of Languages
Assembly and Machine Language
Languages
Machine language
Native to a processor: executed directly by hardware
Instructions consist of binary code: 1s and 0s
Assembly language
Slightly higher-level language
Readability of instructions is better than machine language
One-to-one correspondence with machine language instructions
Assemblers translate assembly to machine code
Compilers translate high-level programs to machine code
Either directly, or Indirectly via an assembler
Compiler and Assembler
Transalation Process
Compiler and Assembler
Transalating Language
Instruction Set Architecture I
ISA
What is Computer
Architecture?
Computer Architecture =
Instruction Set Architecture
+ Machine Organization
Instruction Set Architecture II
ISA
A very important abstraction
interface between hardware and low-level software
standardizes instructions, machine language bit patterns, etc.
advantage: different implementations of the same architecture
disadvantage: sometimes prevents using new innovations
Common instruction set architectures:
IA-64, IA-32, PowerPC, MIPS, SPARC, ARM, and others
All are multi-sourced, with different implementations for the
same ISA
Performance Measures
Basics
Performance is determined by execution time
Do any of these other variables equal performance?
# of cycles to execute program?
# of instructions in program?
# of cycles per second?
average # of cycles per instruction?
average # of instructions per second?
Performance Measures
Metrics
Clock cycle time = 1 / clock speed
CPU time = clock cycle time x cycles per instruction x
number of instructions
Influencing factors for each:
clock cycle time: technology and pipeline
CPI: architecture and instruction set design
instruction count: instruction set design and compiler
CPI (cycles per instruction) or IPC (instructions per cycle)
can not be accurately estimated analytically
Execution Time
CPU execution time = Instruction count × average CPI × Clock
cycle time
Performance Metrics
Speedup
Speedup is a ratio = old exec time / new exec time
Improvement, Increase, Decrease usually refer to percentage
relative to the baseline
= (new perf old perf) / old perf
Example
A program ran in 100 seconds on my old laptop and in 70
seconds on my new laptop
What is the speedup?
(1/70) / (1/100) = 1.42
What is the percentage increase in performance?
( 1/70 - 1/100 ) / (1/100) = 42%
What is the reduction in execution time?
30%
CPU Performance: Problem
Example
My new laptop has an IPC that is 20% worse than my old
laptop. It has a clock speed that is 30% higher than the old
laptop. Im running the same binaries on both machines.
What speedup is my new laptop providing?
Solution
Exec time = cycle time * CPI * instrs
Perf = clock speed * IPC / instrs
Speedup = new perf / old perf
= new clock speed * new IPC / old clock speed * old IPC
= 1.3 * 0.8 = 1.04
Power-Energy of CPU
Problem
If processor A consumes 1.4x the power of processor B, but
finishes the task in 20% less time, which processor would you
pick:
1 if you were constrained by power delivery constraints?
2 if you were trying to minimize energy per operation?
3 if you were trying to minimize response times?
Solution:
Power-Energy of CPU
Problem
If processor A consumes 1.4x the power of processor B, but
finishes the task in 20% less time, which processor would you
pick:
1 if you were constrained by power delivery constraints?
2 if you were trying to minimize energy per operation?
3 if you were trying to minimize response times?
Solution:
1 Proc-B
2 Proc-A is 1.4x0.8 = 1.12 times the energy of Proc-B
3 Proc-A is faster, but we could scale up the frequency (and
power) of Proc-B and match Proc-As response time (while still
doing better in terms of power and energy)
Problem
Execution Time
1 Given the following parameters, answer each of the following two
Solution
1 ∆texec = IC . CPI . ∆tclock = IC . CPI . (clock rate)−1
= 26,395 instr . 2.17 cycles/instr . (3.24 Gcycles/sec)−1
≈ 17,678 . 10-9 sec ≈ 17.7 mic
2 If CPI and clock rate are unchanged, then the remaining variable is
IC. For P2 to run 2.9 times faster, the runtime must be 1 / 2.9 ≈
0.345 that of P1. This means that IC of P2 must be 0.345 times the
IC of P1, so we have IC(P2)≈ 0.345 . IC(P1) = 0.345 . 26,395 9,102
instructions
Amdahl’s Law
Problem
Suppose a program runs in 100 seconds on a computer, with multiply
operations responsible for 80 seconds of this time. How much do I
have to improve the speed of multiplication if I want my program
to run 2 times faster?
Solution
Execution time after improvement =
Execution time affected by improvement
Amount of improvement + Execution time unaffected
Let n be the amount of improvement
50 = 80 n + (100-80)
n ≈ 2.6
Amdahl’s Law: Speedup Calculation
SpeedUp
Execution time after improvement =
Execution time affected by improvement
Amount of improvement + Execution time unaffected
Speedup =
Execution time before
Execution time affected
Execution time before−Execution time affected+ Amount of Improvement
1
Speedup = Fraction time affected
1−Fraction time affected+ Amount of improvement
1
Speedup = Fraction
1−FractionEnhanced + SpeedupEnhanced
Enhanced
Amdahl’s Law: Speedup Calculation
Problem 1
A new processor is 10 times faster on serving a web application
than the old one. Assume original processor is busy in 40% of
computation and waiting I/O in 60%. What is the overall speed?
Amdahl’s Law: Speedup Calculation
Problem 1
A new processor is 10 times faster on serving a web application
than the old one. Assume original processor is busy in 40% of
computation and waiting I/O in 60%. What is the overall speed?
Solution
FractionEnhanced = 0.4, SpeedupEnhanced = 10
1
Speedup = 1−0.4+ 0.4 = 1.56
10
Amdahl’s Law
Problem 2
Bob is given the job to write a program that will get a speedup of 3.8
on 4 processors. He makes it 95% parallel, and goes home dreaming
of a big pay raise. Using Amdahls law, and assuming the problem
size is the same as the serial version, and ignoring communication
costs, what speedup will Bob actually get?
Amdahl’s Law
Problem 2
Bob is given the job to write a program that will get a speedup of 3.8
on 4 processors. He makes it 95% parallel, and goes home dreaming
of a big pay raise. Using Amdahls law, and assuming the problem
size is the same as the serial version, and ignoring communication
costs, what speedup will Bob actually get?
Solution
FractionEnhanced = 0.95, FractionEnhanced = 4
1
Speedup = 1−0.95+ 0.95 = 3.47
4
Amdahl’s Law
Problem 3
Mary has a problem whose size can increase with an increasing num-
ber of processors. She executes the program and determines that in
a parallel execution on 100 processors. 5% of the time is spent in
the sequential part of the program. What is the scaled speedup of
the program on 100 processors?
Amdahl’s Law
Problem 3
Mary has a problem whose size can increase with an increasing num-
ber of processors. She executes the program and determines that in
a parallel execution on 100 processors. 5% of the time is spent in
the sequential part of the program. What is the scaled speedup of
the program on 100 processors?
Solution
FractionEnhanced = 0.95, FractionEnhanced = 100
1
Speedup = 1−0.95+ 0.95 = 19
100
Average CPI
Calculation
Performance Equation:
CPU Time = Cycle time x Instruction Count x Average CPI
Assuming n different type of instructions, each with count ICi
and requiring CPIi cycles:
CPU Time = Cycle time x Σni=1 (ICi x CPIi )
Then: Pn
i=1 (ICi x CPIi )
Average
Pn CPI = IC
= i=1 CPIi x Fi ,
where Fi is the frequency of instruction type i
Performance with multiple type instructions
Problem
A computer M has the following CPIs for instruction types A
thru D, and a program P has the following mix of instructions
(Note: pct = percent): M 2 : Type A CPI (A) = 1.7 Type B
CPI (B) = 2.1, Type C CPI (C) = 2.7 Type D CPI (D) = 2.4
P 3 : Type A = 22 pct, Type B = 29 pct, Type C = 17 pct,
Type D = remaining pct
Calculate
Pn the average CPI of Machine M:
CPI= i=1 CPIi x Fi ,
= 1.7(0.22) + 2.1(0.29) + 2.7(0.17) + 2.4(0.32)
= 2.21
Calculate the runtime of P on M if IC = 22,311 and clock rate
is 3.3 GHz:
CPU time = IC . CPI . (clock rate) −1
= 22,311 . 2.21 . (3.3 Gcycles/sec) −1
=14.9 µs
MIPS Architecture
MIPS Basics
MIPS: Microprocessor
without Interlocked Pipeline
Stages. Well be working with
the MIPS instruction set
architecture
similar to other
architectures developed
since the 1980’s
Almost 100 million MIPS
processors manufactured in
2002
used by NEC, Nintendo,
Cisco, Silicon Graphics,
Sony,
MIPS Design Principles
I Format
lw $9, 1200($8) == lw $t1, 1200($t0)
R Format
add $8, $8, $9
MIPS Instruction Types
Example
Machine language for
add $8, $17, $18
See reference card for op, funct values
Data Transfer I
Array Variable
Data Transfer Instructions - Binary Representation
x = x + y + z - q;
Solution
add $t0,$s1,$s2
add $t0,$t0,$s3
sub $s1,$t0,$s4
MIPS Examples II
Arithmetic operations
A simple take two numbers from the user and perform basic
arithmetic functions such as addition, subtraction and
multiplication with them
Program Flow
1 Print statements to ask the user to enter the two different
numbers
2 Store the two numbers in different registers and print the
menu of arithmetic instructions to the user
3 Based on the choice made by the user, create branch
structures to perform the commands.
4 Print the result and Exit
A Simple ALP II
• Step 1 Completes
•Step 2 completes
Step 3- Addition
addProcess:add $t6,$t0,$t1 # $t6=$t0+$t1
J DisplayResult # Jump to display
Step 3- Subtraction
subProcess:sub $t6,$t0,$t1 # $t6=$t0-$t1
J DisplayResult # Jump to display
Step 3- Multiplication
addProcess:mul $t6,$t0,$t1 # $t6=$t0*$t1
J DisplayResult # Jump to display
A Simple ALP VII
Array Access
Getting the data from an array cell, e.g, x = list[i];
Storing data into an array cell, e.g. list[i] = x;
Determining the length of an array, i.e. list.length.
MIPS Program- Arrays II
To Access array:
Address Calculation:
Address of vowels[k] == vowels + k
Address of list[k] == list + 4 * k
MIPS Program- Arrays III
Program Flow
1 Declare and initialize array, array-size
2 First iteration
Set the counter<– array-size
get the address of the array to a register
get the first value and set as max
increment pointer
decrement count
3 Get into loop
Get next number
Compare with max
replace max if a new max is found
increment pointer, decrement count
4 Display the result
Largest of Array II
Do first iteration
.text
.globl main
main:
jal start
start:
li $t3, 6 #size
la $t1, array # get array address
lw $s5, ($t1) # set max, $s5 to array[0]
add $t1, $t1, 4 # skip array[0]
add $t3, $t3, -1 # len = len - 1
Largest of Array IV
Go to loop
loop:
lw $t4, ($t1) # get n of array[n]
ble $t4,$s5,L1 #if t4 is not less than t5 we got a new max
lw $s5, 0($t1) #get element in array <-- Pretty sure im do
#sw $s5,0($t1) #max
L1: add $t3, $t3, -1 #counter-1
addi $t1, $t1, 4 # advance array pointer
bnez $t3, loop #if not 0 then go on and loop
Largest of Array V