0% found this document useful (0 votes)
24 views10 pages

Computer Systems

Uploaded by

psychoseacap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views10 pages

Computer Systems

Uploaded by

psychoseacap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Microprocessors

Microprocessor Terminology Classification


• A central processing unit (CPU) on a
• Instruction Set – Set of instructions that 𝜇P can understand. It Reduced Instruction Set Computer (RISC)
programmable electronic chip. • Simplified instruction set requiring
varies from one 𝜇P to another.
• Capable of executing code instructions for
• Bandwidth – The number of bits processed in a single instruction. one clock cycle for execution.
processing data and controlling associated • Requires larger RAM to store
• Clock Speed – Determines the number of operations a 𝜇P can
circuitry instructions.
perform in one second. Expressed in MHz, GHz.
• Examples: Intel 4004 (first 𝜇P – 1971), • Contains large number of registers but
• Word Length – Data processing limit of a 𝜇P. Ranges between 4-
Intel 8008, Intel 8086 etc. fewer transistors.
bits to 64-bits.
• Ideal for simple applications with
Microcomputer limited battery power.
• An assembly of CPU (𝜇P), I/O devices and
memory systems for processing data. Complex Instruction Set Computer (CISC)
• Microcomputers are programmable, can • Complex instructions are built directly
be considered as general-purpose logic into hardware.
devices. • Larger number of instructions can be
• Examples: Mark-8 (Intel 8008 𝜇P), Altair supported.
8800 (Intel 8080 𝜇P) and many other kits. • Instruction decoding process is
complex.
Microcontroller • Ideal for applications requiring large
• A single integrated circuit (IC) consisting of number of instructions.
microprocessors, memory and I/O ports
• Microcontroller can be considered
miniature computers designed to control
small features of large systems.
Applications
• Examples: Building automation, Robotics, ❑ Scientific instruments ❑ Traffic control
Lighting control, Toys, Communications ❑ Satellites ❑ Inventory control
etc. ❑ Manufacturing ❑ Security and fire alarms
❑ Automation/Robotics ❑ Home appliances

Copyrighted Material © www.studyforfe.com


Architecture and Interfacing - Part 1
Von Neumann/Princeton Harvard Architecture
Architecture refers to organization and
design of hardware and software Architecture Key design feature
components. • Separate memory for program instructions and
It aims to create a system that meets design data.
objectives. • Dedicated information-handling systems are
Design objective include the following: required.
• Functionality Advantages:
• Reliability • Instructions and data can be accessed in parallel.
• Cost effectiveness • Instructions are executed in fewer cycles due to
• Energy consumption parallelism.
Microprocessor is like a CPU on one chip. Disadvantages:
It can be broadly divided into three major • Multiple memory interfaces and sub-systems
segments. required.
1. Arithmetic/Logic Unit – ALU • Free data memory cannot be used for instructions
2. Control Unit – CU Key design feature and vice versa
3. Register Array ❑ Instructions and data are stored in the same memory Harvard architecture was mostly ignored until 1970s.
Interfacing refers to interaction of module Since then, it’s being widely used in microcontroller
microprocessor with I/O devices and memory. ❑ Managed by same information-handling system design and digital signal processing applications.
Advantages:
Architecture Classification ❑ Memory is flexible and control unit is cheaper/simpler.
Computer architecture can be broadly ❑ Suitable for general purpose processors.
classified into two types: Disadvantages:
• Von Neumann/Princeton Architecture - ❑ Bus connecting processor and memory can become a
Developed in 1945 by mathematician and bottleneck
physicist John Von Neumann ❑ Accessing instructions and data simultaneously is not
• Harvard Architecture - Traces roots to possible.
Harvard Mark II computer developed at This architecture is commonly used in modern computers
Harvard University because of simpler design and flexibility.
Bottleneck is generally not a problem – time required to
execute an instruction is used to fetch next one.
Copyrighted Material © www.studyforfe.com
Architecture and Interfacing - Part 1
Intel® 8085 Microprocessor

Register Array
• Six 8-bit general purposes registers
• Stores data during program execution
• B, C, D, E, H and L – can be combined
• Additionally there are two 16-bit registers
Stack Pointer (SC)
• 16-bit register
Arithmetic/Logic Unit (ALU)
• Points to memory location in R/W memory, stack.
• Performs arithmetic and logic operations
Program Counter (PC)
• Add, Subtract, AND, OR etc.
• 16-bit register
• Coordinates with accumulator, temp register, flags
• Sequences the execution of instructions
• Temp register holds data during operations
• Points to memory address containing next byte
Accumulator
• 8-bit register that is part of ALU
• Stores 8-bit data to perform operations.
• Results of operations are stored in accumulator.
Flags
• 8-bit register which is not used as a register
• Stores output of five ALU flip-flops
• Zero(Z), Carry(CY), Sign(S), Parity(P), Aux.Carry (AC)

Copyrighted Material © www.studyforfe.com


Architecture and Interfacing
Instruction Format Addressing Modes Register Addressing Mode Register Indirect Addressing
Instruction format is the bit layout of an instruction in terms Indirect Addressing Mode • Instruction specifies the Mode
of constituent elements. • Effective address is calculated by the register number • Instruction specifies a register
Instruction consists of three parts: processor. • Operand is held in a number
1. Operation Code – Op code • Content of the address are used to register • Register holds the memory
• Specifies the operation to be performed by the form a second address. • Small number of bits are address where operand is
instruction • Data is stored at the second address. required to describe stored
• Instructions can be arithmetic, logical, control, I/O • Multiple memory accesses are register number • Large address spaces can be
etc. ADD, MOVE, BRANCH etc. required. • Very fast execution since accessed
2. Operand • Larger address space be used as no memory access is • One fewer memory access as
• Arguments of the instruction. compared to Immediate and Direct required. compared to indirect
• Data on which instruction is to be executed. addressing. • Applicable when data addressing mode.
• Number of operands depends on instruction. Examples: value is held in a register. Examples:
3. Addressing mode LDA (6000H) // Accumulator  Examples: ADD R2, (R3) // R2 = R2 +
• Specifies the location of operands in an instruction. (Mem[6000]) ADD R2, R3, R4 // R2 = R3 + Mem[R3]
• Different addressing modes can be used. ADD R2, (30B4H) // R2 = R2 + R4 MOV R3, (R6) // R3 = Mem[R6]
(Mem[30B4]) MOV R3, R6 // R3 = R6

Direct Addressing Mode Immediate Addressing Mode


Opcode without operand • Operand is part of the instruction itself.
• Address of source data is given as operand.
STOP, START • Direct data is given in the operand and moved
• Data is available at the memory location.
Opcode with one memory address to register.
• Accepts data from outside device to store in
COPY X, LOAD Y • Useful in working with constants.
accumulator and vice versa.
Opcode with two memory addresses • Fast but limited application due to size of
• Single memory access is required.
ADD A, B address field
• Useful in accessing static data.
Opcode with two register & mem. address Examples:
• Limited address space (8-bit, 16-bit etc.)
ADD R2, A MOV A, 3DH // Accumulator  3DH
Examples:
Opcode with two registers ADD R2, #5 // R2 = R2 + 5
LDA 6000H // Accumulator  Mem[6000]
ADD R3, R4 ADD R2, R3, 35 // R2 = R3 + 35
ADD R2, 30B4H // R2 = R2 + Mem[30B4]

Copyrighted Material © www.studyforfe.com


Architecture and Interfacing
Addressing Modes Multi-core Processor Threading
Indexed Addressing Mode • Multi-core processors are composed • Thread is a flow of execution
• A general/special purpose register is used as an index register. of two or more independent CPUs. through process code.
• Instruction can specify offset/displacement which is added to index • Cores can be integrated on single or • Each thread belongs to exactly
register to get effective operand address. multiple circuit dies. one process. Threads only exist
• Address of operand can also be obtained by adding contents of index • Core memories can be shared. within a process.
registers. ❑ L1 and L2 cache are typically • Threads improve application
• Index addressing mode is used to access an array with data in private to each core. performance through
successive locations. ❑ L3 cache is shared among cores. parallelism.
Examples: • Multi-threading is the process of
LOAD R2, 1000(R4) // R2 = Mem[1000 + R4] providing multiple threads of
ADD R2, (R3 + R4) // R2 = R2 + Mem[R3 + R4] execution simultaneously
• Single-core processors used time
Single-core Processor division multiplexing to achieve
multithreading.
Limitations of Single-core • Multi-core processors can
Processor achieve true multi-threading
• Difficult to increase clock simultaneously.
frequencies
• Heating problems
• Design becomes
• increasingly complicated
Possible solution:
• Replicate hardware
• Run at lower clock speed
and power
• 1 core @ 4 GHz = 2 cores @
2GHz each

Copyrighted Material © www.studyforfe.com


Memory technology and systems
Memory technology and systems Read Only Memory (ROM)
Random Access Memory (RAM) Information is stored permanently, can only be read from.
Computer uses memory to store data and instructions.
Non-volatile memory which stores important programs such
Memory systems can be divided into two broad
Internal memory of CPU for storing data and as BIOS for booting computer.
categories as shown below:
program results. Large storage capacity in the range of GBs.
• Primary/Main memory
CPU can read from or write to any location ROM can be classified into following categories:
• Readily accessible by computer.
randomly. • Programmable Read Only Memory (PROM)
• Random Access Memory - RAM (Volatile)
Short-term volatile memory which loses content • Erasable Programmable Read Only Memory (EPROM)
• Read Only Memory – ROM (Non-Volatile)
when power is turned off. • Electrically Erasable Programmable Read Only Memory
• Secondary memory
RAM is relatively small in physical size and amount (EEPROM)
• Typically not part of computer.
• Non-volatile of data it can hold. Secondary Memory
• Examples: Hard disk, Compact Disks etc. • Stores back-up information that is not required by user right
RAM can be divided into two categories:
away.
• Static RAM (SRAM)
• CPU cannot access secondary memory directly.
• Constructed using flip-flops
• Slower access as compared to primary memory
• Fast, expensive, generates heat
• Cheaper than primary memory on per unit basis
• Used as cache
• Non-volatile
• Dynamic RAM (DRAM)
• Examples: Hard disks, Compact Disks, Floppy etc.
• Constructed using capacitors
• Slower than SRAM but cheaper
• Used as main memory

Copyrighted Material © www.studyforfe.com


Cache Memory – Part 1
Ideal Memory Principle of locality Level 1, 2, 3 Cache
System speed depends on processor + memory + Principle of temporal locality
architecture + other design factors. • If a program accessed a memory location, it
Ideal memory has following desirable features: will access this memory again.
• Zero delay & Zero cost Principle of spatial locality
• Infinite capacity • If a program accessed a memory location, it
• Infinite bandwidth to support multiple access will also access a nearby address.
in parallel Examples: Loops and Arrays.
Problem – Many of the above mentioned features How can we exploit principle of locality?
are mutually exclusive. • Store recently used data in registers – but
• Memories with large capacity are slower than there are limited number of register! Cache Memory – Terminology
smaller ones. • Make use of high-speed cache memory –
• Faster memories are more expensive than • Cache blocks/line – Basic storage unit in cache.
But how? • Cache hit – When the required element is found in cache.
slower ones. • Exploit principle of temporal locality by • Hit latency – Time to locate an element in cache.
Memory Hierarchy storing recently used program data in cache • Cache miss – When the required element is not in cache.
• Exploit principle of spatial locality by • Miss latency – Time to fetch & load an element from memory to cache.
Solution to non-ideality of memory – Memory • storing adjacent program data in cache Key performance metrics
Hierarchy
Cache basics • Cache hit rate = (# of hits) / (# hits + # misses)
• Design a system with multiple memory
Cache works like a filing system. Typical caches have hit rates > 95%
levels
Files are in one of the following locations. • Average memory access time = (hit rate x hit latency) +
• Memory closest to the processor shall be
• In your hand (miss rate x miss latency)
fastest
• Data frequently used by processor shall be • On your desk Cache Memory – Design
stored in fastest memory • In your filing cabinet • Cache memory is divided into blocks/lines.
• In your archive box at home • Different cache memories have different block sizes.
• Some rental storage location • Number of blocks in a cache is a power of 2.
⮚ Active files are in registers. Design considerations include:
⮚Upcoming files are in cache. • How is data mapped from main memory to cache?
⮚Infrequently used files are in DRAM. • How to replace a block when cache is full? (replacement policy)
⮚Rarely used files are in backup storage. • Should we write to both cache and main memory? (write policy)

Copyrighted Material © www.studyforfe.com


Cache Memory – Part 2
Cache mapping techniques Advantage:
Cache memory Direct Mapping
Cache mapping defines how a • Simplest cache mapping technique
Cache memory is divided into Each memory block maps to a specific cache block,
memory block is mapped to • Easy to implement
blocks/lines. the simplest cache mapping.
cache. • No need for replacement policy
These memory blocks can be of Direct Mapping Algorithm
• Main memory contains equal Disadvantage:
different sizes. If cache has 2^n blocks, data at memory address i is
size partitions called memory • Not flexible
Cache Size – C (bytes) = S x A x B mapped as: Cache block index = 𝑖 𝑚𝑜𝑑 2𝑛
blocks or frames. • Inefficient usage of cache even when it is
• S = Number of sets or cache Cache memory block # = (Main memory block #) mod
• During mapping, some of the not full. 0 mod 128 = 0
rows. (# of cache memory blocks)
main memory blocks are 128 mod 128 = 0
• B = Block size (bytes) 0 mod 4 = 0 1 mod 4 = 1 2 mod 4 = 2 3 mod 4 = 3
copied to cache memory. 256 mod 128 = 0
• A = Associativity (determines 4 mod 4 = 0 5 mod 4 = 1 6 mod 4 = 2 7 mod 4 = 3
• Remember, Main Memory Even if remaining cache is empty, collisions
mapping techniques, to be 8 mod 4 = 0 9 mod 4 = 1 10 mod 4 = 2 11 mod 4 = 3
Size >> Cache Memory Size will result in overwriting/evictions.
discussed later) 12 mod 4 = 0 13 mod 4 = 1 14 mod 4 = 2 15 mod 4 = 3
• Performance of cache ❑ Poor performance if all three blocks are
Cache memory address is generally S = # of sets (rows) = # of cache blocks, Associativity = 1,
memory mapping function is used frequently.
divided into three fields. Block size = B bytes
Tag – Most significant bits. key to overall speed.
Cache mapping can be performed Consider following scenario…
Determines which main memory • Cache memory consists of 128 blocks each containing 16 bytes.
block is mapped to cache memory. using following techniques:
• Direct mapping Total cache size = 128 blocks x 16 bytes = 2048 (2K) bytes.
Index – Specifies cache row (set) in • Main memory is addressable by 16-bit address
which main memory block is copied. • Fully associative mapping
• (N-way) Set associative Total size of main memory is 216 = 65536 (64K) bytes.
Block Offset – Least significant bits. Number of 16-byte blocks in main memory = 64K / 16 = 4K
Locates data within a block. mapping
Cache memory block # = mod 128
Total cache size = 128 blocks x 16 bytes = 2048 (2K) bytes.
Number of 16-byte blocks in main memory = 64K / 16 = 4K
• Memory address is divided into tag, index, block-offset.
• Block size = 16 bytes = 24 bytes, Block offset bits = log 2 𝐵 = 4 bits.
• Total number of rows = 128 = 27 blocks
Index bits = log 2 𝑆 = 7 bits
• Tags = # of main memory blocks / # of cache memory blocks = 4K / 128 = 25
Tag bits = log 2 25 = 5 bits

Copyrighted Material © www.studyforfe.com


Cache Memory – Part 3
Fully Associative Mapping N-way Set Associative Mapping Given main memory block can only be mapped to a particular set.
Main memory block can be Hybrid between Direct Mapping and Fully Associative Mapping. # of sets = S = # of cache blocks / set associativity
placed in any one of ‘n’ cache Cache blocks are grouped into sets where each set can contain N blocks. Just like direct mapping, memory block is mapped to a set using
memory block locations. N is a design parameter. following formula:
Fully associative cache of n ❑ N = 1, direct mapping Set # = (Memory Block #) mod (# of sets in cache)
blocks is n-way set associative. ❑ N = # of cache blocks, fully associative mapping Just like fully associative mapping, block can be placed anywhere in
Memory address is divided into ❑ N-way set associative has typical N values of 2, 4, 8.. set.
two fields only i.e. Tag and Block
Offset. 4-way Set Associative Mapping
No index is required because
main memory block can go
anywhere in the cache.

Copyrighted Material © www.studyforfe.com


Cache Memory – Part 4
Cache Replacement Policies Cache Write Policies Cache Write Policies
• Writing to the cache is more complicated Writing to the cache is more complicated than reading from the cache.
Replacement policies are required for fully than reading from the cache. If a cache block hasn’t been modified, it can be overwritten immediately.
associative and set associative mapping • If a cache block hasn’t been modified, it If cache block has been updated, main memory shall be updated before
techniques. can be overwritten immediately. cache block is replaced.
Direct mapping doesn’t require a replacement • If cache block has been updated, main Consequently, we have two write policies.
policy per say – simply evict/overwrite existing memory shall be updated before cache Write through
block. block is replaced. • Simplest technique, easy to implement.
Replacement policies are implemented in • All write operations are made to both main and cache memories.
hardware to speed up the process. • Data is always consistent.
• Least Recently Used (LRU) – replace cache • Generate a lot of main memory traffic and bottlenecks.
block that’s been in cache for longest Write back
without recent references • Update cache memory right away and main memory later
• It is one of the most effective • Dirty bit needs to be added to each cache memory block
replacement policy. • Dirty bit is used to track whether main memory is up-to-date or not.
• Expensive to implement, especially • Dirty cache block needs to be copied to main memory before
for high associativity replacement.
• First-in First-out (FIFO) – replace the cache • Clean blocks don’t need to be written back to main memory.
block that has been in the cache longest. • Better performance but more complicated implementation.
• FIFO can be worst possible for
repeated linear scans.
• Least Frequently Use (LFU) – replace the
cache block that has experienced the least
number of references.
• Random – pick a random cache block and
use it as a candidate for replacement
• Easy to implement
• Performance is just slightly inferior
to abovementioned algorithms

Copyrighted Material © www.studyforfe.com

You might also like