Computer Memory System Overview, Cache Memory, Design Elements and Priciples of Cache Design

WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and
Principles of Cache Design
Objectives:
At the end of the lesson the learner will be able to:
Define computer memory system
Identify the different types of memory
Identify the types of cache memory
Understand the design elements and principles of cache design
COMPUTER MEMOMY SYSTEM OVERVIEW
The computer memory is one of the most important elements in a computer system. It
stores data and instructions required during the processing of data and output results.
Storage may be required for a limited period of time, instantly or for an extended
period of time. Computer memory refers to the electronic holding place for
instructions and data where the processor can read quickly.
Memory Hierarchy
The memory is characterised on the basis of two key factors; capacity and access
time. The lesser the access time, the faster is the speed of memory.
The computer uses a hierarchy of memory that is organised in a manner to

enable the fastest speed and largest capacity of memory as shown in figure.
1|Page Computer Architecture and Organization

Types of Memory:
In general, the memory is classified into two categories.
1. Primary memory or Main memory.

2. Secondary memory or Auxiliary memory.
Parameters of Memory:
The following terms are most commonly used for identifying comparative behaviour
of various memory devices and technologies.
Storage Capacity It is representative of the size of the memory. The capacity of internal
memory and main memory can be expressed in terms of number of words or bytes.
Access Modes A memory is comprised of various memory locations The information
from these memory locations can be accessed randomly. sequentially and directly.
Access Time The access time is the time required between the desired modes for a read
or write operation till the data is made available or written at the desired location.
Physical Characteristics In this respect. the devices can be categorised into four main
categories electronic, magnetic, mechanical and optical.
Permanence of Storage Its permanence is high for future use in magnetic materials.
Primary Memory(Main Memory):
The memory unit that communicates directly with the CPU is called main memory.
The primary memory allows the computer to store data for immediate manipulation
and to keep track of what is currently being processed. It is volatile in nature, it means
that when the power is turned off, the contents of the primary memory are lost forever.
Primary memory further classified in two categories.

1. Random Access Memory (RAM):
It is also known as read/write memory, that allows CPU to read as well as write data
and instructions into it.
RAM is used for the temporary storage of input data, output data and intermediate
results. RAM is a microchip implemented using semiconductors.
There are two categories of RAM

(i)Dynamic RAM (DRAM) It is made up of memory cells where each cell is composed
of one capacitor and one transistor. DRAM must be refreshed continually to store
information. The refresh operation occurs automatically thousands of times per second
DRAM is slower, less-expensive and occupies less space on the computer’s
motherboard.
(ii) Static RAM (SRAM) It retains the data as long as power is provided to the memory
chip. It needs not be ‘refreshed’ periodically. SRAM uses multiple transistors for each
memory cell. It does not use capacitor. SRAM is often used as cache memory due to its
high speed. SRAM is more expensive than DRAM.
Extended Data Output Dynamic RAM (EDO DRAM) It is a type of RAM chip. It is
used to improve the time to read content from memory and enhance the method of
access.

Cache Memory
Cache memory is a storage buffer that stores the data that is used more often.
temporarily and makes them available to CPU at a fast rate.
The data and instructions that are required during the processing of data are brought
from the secondary storage devices and stored in the RAM. For processing it is
required that the data and instructions are accessed from the RAM and stored in the
registers.
Cache memory is a very high speed memory placed in between RAM and CPU. Cache
memory increases the speed of processing.
Cache memory is very expensive, so it is smaller in size. Generally. computers have

cache memory of sizes 256 KB to 2 MB.
2. Read Only Memory (ROM):
It is also known as non-volatile memory or permanent storage. It does not lose its
content when the power is switched off.ROM has only read capability, no write
capability. ROM can have data and instructions written to it only one time. Once a
ROM chip is programmed at the time of manufacturing, it cannot be reprogrammed or
rewritten.
There are three categories of ROM.

(i) Programmable ROM (PROM) It is also non-volatile in nature. Once a PROM has
been programmed, its contents can never be changed. It is a one-time programmable
device. PROMs are manufactured blank and can be programmed at buffer, Final test or
in system.

These type of memories are found in video game consoles, mobile phones, implantable
medical devices and high definition multimedia interfaces. PROM was invented by
Wen Tsing Chow in 1956.
(ii) Erasable Programmable ROM (EPROM) It is similar to PROM, but it can be erased
by exposure to strong ultraviolet light,then rewritten. So,it is also known as Ultraviolet
Erasable Programmable ROM (UV EPROM).EPROM was invented by Dov Frohman of
Intel in 1971.
(iii) Electrically Erasable Programmable ROM (EEPROM) It is similar to EPROM, but

it can be erased electrically, then rewritten electrically and the burning process is
reversible by exposure to electric pulses.
Tit-Bits
 Flash Memory is a kind of semiconductor-based nonvolatile rewritable memory.

used in digital camera. mobile phone. Printer.etc.
 Virtual Memory is a technique that allows the execution of processes that are not
completely in main memory. One major advantage of this scheme is that
programs can be larger than main memory. This technique frees programmers
from the concems of memory storage limitations.
 Buffer is a temporary physical storage used to hold data during execution of
process from one place to another.
Secondary Memory (Auxiliary Memory/Storage Devices)
The secondary memory stores much larger amounts of data and information for
extended periods of time. Data in secondary memory cannot be processed directly by
the CPU, it must first be copied into primary storage i.e…, RAM.
Secondary storage is used to store data and programs when they are not being
processed. It is also non-volatile in nature. Due to this, the data remain in the

secondary storage as long as it is not overwritten or deleted by the user. It is a
permanent storage i.e., device.
Secondary memory devices include.

Magnetic Disks :
 Hard Disk Drive.

 Floppy Disk.
 Memory Stick.
Optical Disk:
 CD
 DVD
 Blue-ray Disk
Solid State Disks:
 Pen/Flash Drive
Hard Disk Drive (HDD):
It is a non-volatile, random access digital data storage device. It is a data storage device
used for storing and retrieving digital information using rotating disks (platters)
coated with magnetic material. All programs of a computer are installed in hard disk.
It consists of a spindle that hold non-magnetic flat circular disks, called platters, which
hold the recorded data. Each platter requires two read/write heads, that is used to
write and read the information from a platter. All the read/ write heads are attached to
a single access arm so that they cannot move independently.
The information is recorded in bands; each band of information is called a track. Each
platter has the same number of tracks and a track location that cuts across all platters is
called a cylinder. The tracks are divided into pie- shaped sections known as sectors.
Floppy Disk:
It is used to store data but it can store small amount of data and it is slower to access
than hard disks. It is round in shape and a thin plastic disk coated with iron oxide.
Data is retrieved or recorded on the surface of the disk through a slot on the envelope.
Floppy disks is removable from the drive. Floppy disk is available in three sizes; 8
inch,5 1/4 inch and 3 1/2 inch.

 5 1/4 inch floppy disk has a capacity of 1.2 MB.
 3 1/4 inch floppy disk has a capacity of 1.44 MB.
Compact Disk (CD):
It is the most popular and least expensive type of optical disk. A CD is capable of being
used as a data storage device along with storing of digital audio. The files are stored on
this particular contiguous sectors.
CDs are categorised into three main types.
1. CD-ROM(Compact Disk – Read Only Memory) It is designed to store computer

data in the form of text and graphics, as well as hi- fi stereo sound. It is capable of
storing large amounts of data- up to 1GB, although the most common storage
capacity is 700 MB. Data is recorded permanently on the surface of the optical
disk through the use of laser. The recorded content cannot be changed or erased
by users. It is also called WORM (Write Once Read Many) disk.
2. CD-R (Compact Disk – Recordable) Data can be written on these disks only once.
The data once stored in these disks cannot be erased.
3. CD-RW (Compact Disk- Rewritable) It is an erasable disk . CD-RW is used to
write data multiple times on a disk by the use of format feature.
Digital Video Disk (DVD):

DVD is also known as super Density Disk (SD). A DVD is an optical disk storage
media manufactured by Philips, Sony, Toshiba and Panasonic in 1995.
DVDs offer higher storage capacity than Compact discs while having the same
dimensions.
Depending upon the disk type, DVD can store several Gigabytes of data (4.7 GB -17.08
GB) DVDs are primarily used to store music or 6 movies and can be played back on
your television or the computer too. They are not rewritable media.
DVDs come in three varieties.

1. DVD-ROM (Digital Video Disk-Read Only Memory).
2. DVD-R (DVD – Record Able).
3. DVD-RW (DVD – Re Writable).
Blue-ray Disk:

Blue-ray disk (official abbreviation BD) is an optical disk storage medium designed to
recapture the data normally in DVD format. Blu-ray discs contain 25 GB(23.31 GB) Per
Layer space.
The name Blue-ray disk refers to the blue laser used to read the disk, which allows
information to be stored at a greater density than the longer-wavelength red laser used
is DVDs.
Blu-ray can hold almost 5 times more data than a single layer DVD.
The variations in the formats are as follows

1. BD-ROM (Read only) for pre-recorded content
2. BD-R (Recordable) for PC data storage
3. BD-RW (Rewritable) for PC data storage
4. D-re (Rewritable)for HDTV recording
Pen/Thumb Drive:
10 | P a g e Computer Architecture and Organization

Pen drive is also known as flash drive. As flash drive is a data storage device that
consists of flash memory (key memory) with a portable USB (universal Serial Bus)
interface. USB flash drives are typically removable, re writable and much smaller than
a floppy disk. A USB flash drive is same as the size of thumb that plugs into a USB port
on the computer.
Today, flash drives are available in various storage capacities as 256MB, 512MB, 1GB,
4GB, 16GB upto 64GB. They are widely used as an easy and small medium to transfer
and store the information from their computer.

Memory Stick:
It is a USB-based flash memory drive. A family of flash memory cards from sony
designed for digital storage in cameras. camcorders and other handheld devices.
Capacity of memory stick varies from 4 MB to 256GB.
Magnetic Tape
Magnetic tapes are made of a plastic film-type material coated with magnetic materials
to store data permanently.Data can be read as well as recorded. It is usually 12.5 mm to
25 mm wide and 500 m to 1200 m long. These can store data in a sequential manner.
The data stored in magnetic tape is in the form of tiny segments of magnetised and
demagnetised portion on the surface of the material. Magnetic tapes are durable, can
be written. erased and rewritten. Magnetic tapes hold the maximum data, which can
be accessed sequentially.
Types of Magnetic Tape:
There are mainly two types of magnetic tape as Tape Reel and Tape Cassette. Each of
the type has its own requirements. The older systems designed for networks use reel-
to-reel tapes. Newer systems use cassettes holding more data than that of the huge
reels.
Tit-Bits:
 The rate at which data is written to disk or read from disk is called data transfer
rate.
 Track it records data bits as tiny magnetic spots.
 Sector it holds a block of data that is read or written at one time.
 Root directory. is the main folder of disk . it contains information about all folders
on the disk.
 Hard disk is a fixed disk i.e., cannot be removed from the drive.
Secondary Memory Device and their Storage Method and Capacity.
Storage
S.No Secondary Memory Device Method Capacity
1 Floppy Disk (5.25 inch) Magnetic 1.2 MB
2 Floppy Disk (3.5 inch) Magnetic 1.44 MB
3 Floppy Disk (8 inch) Magnetic 80 KB to 242 KB
4 Hard Disk Magnetic Up to 1 TB
640 MB to 680
5 CD-ROM Optical MB
6 DVD-ROM Optical 4.7 GB to 17 GB
7 Pen-Drive Solid State 1 GB to 512 GB
8 Magnetic Tape Solid State Up to 1 TB
Memory Measurement:
When you use a RAM,ROM. Floppy disk or hard disk. the data is measured using
some unit. In computer terminology. They are called nibble. Bit, Byte, Kilobyte,
Megabyte, Gigabyte, etc.
Bit It stands for a Binary Digit. Which is either 0 or 1.
Byte (B) A byte is approximately one character (letter ’a’. number ‘1’. Symbol’?’. etc…).
Also. a group of 8 bits is called a byte.
Nibble 4 bits make one nibble.
Kilobyte (KB) In memory. a group of 1024 bytes is called a Kilobyte.
Megabyte (MB) In memory. a group of 1024 Kilobytes is called a Megabyte. It is

sometimes used . less precisely, to mean 1 million bytes or 1000 KB.
Gigabyte (GB) In memory , a group of 1024 megabytes is called a Gigabyte. It is

sometimes used, less precisely, to mean 1 billion bytes or 1000 MB . Now, a number of
companies manufacture memory chips in terms of Megabyte such as 64 MB, 128 MB,
256 MB, 1.2 GB etc.
Terabyte (TB) A terabyte , exactly 2 bytes (2 GB).is approximately a trillion (10 ) bytes.
Petabyte (PB) one petabyte of information equal to 1000 terabytes or 10 bytes.
Exabyte (EB) One Exabyte of information equal to 1000 petabytes or 10 bytes.
Zettabyte (ZB) One zettabyte of information equal to 1000 exabytes or 10 bytes.
Units of computer memory measurements.
1Bit = Binary Digit

8 Bits = 1 Byte = 2 Nibble
1024 Bytes = 1 KB (Kilobyte)
1024 KB = 1 MB (Megabyte)
1024 MB = 1 GB (Giga Byte)
1024 GB = 1 TB (Terabyte)
1024 TB = 1 PB (Petabyte)
1024 PB = 1 EB (Exabyte)
1024 EB = 1 ZB (Zettabyte)
1024 ZB = 1 YB (Yottabyte)

1024 YB = 1 ( Brontobyte)
1024 Brontobyte = 1 (Geop Byte)
 Bit is the smallest memory measurement unit.

 Geop Byte is the highest memory measurement unit.
CACHE MEMORY
Before getting on with the main topic let's have a quick refresher of how memory
systems work - skip to "Waiting for RAM" if you already know about addresses, data
and control buses.
Back in the early days of computing things were simple. There was the processor and
the memory. The processor stored data in the memory when it wanted to and read the
data back when it wanted to.
What happens when the processor wants to use memory?
Let’s just consider RAM because the only difference between it and ROM is that the
processor cannot write data to ROM. The processor has a connection to the memory
that allows it to communicate the data being stored or retrieved which consists of a
wire for each bit of the address – making an address “bus”.
You will also know that the number of memory locations that the processor can
address depends on the number of address lines in the bus - each additional address
line doubles the amount of memory that can be used.
The processor also has a data bus, which it uses to send and retrieve data to and from
the RAM. Again the number of wires in the data bus is the number of bits that a single
memory location stores and the number of bits transferred in a single memory
operation.

Notice that only a single data bus is needed as long as it is a “bi-directional” bus. Early
microprocessor systems had separate “in” and “out” data buses but the extra
complication of all those duplicated wires eventually resulted in the adoption of a bi-
directional bus.
So far, so good, and in fact nothing much more than was explained about general
memory systems in How Memory Works. However, as well as the address and data
bus there also has to be a control or system bus.
This passes signals to the memory that control exactly what is happening. For example,
there is usually a Read/Write (R/W) line, which indicates the direction in which the
data bus is operating and whether the memory should read the data on the data bus or
use the stored data to set the state of the data bus.
Memory

This much is obvious, but there also have to be additional control lines to tell the
memory when the address on the address bus is valid – after all you don’t want the
memory to respond to the transient addresses that appear on the address bus as it
changes from one address to another.
There is also usually a control signal that tells the processor that the data on the data
bus is valid and so on. The exact arrangement of the control bus varies from processor
to processor but you can see what sort of things it has to deal with.
Waiting for RAM
Now we come to the interesting part – computers are made of components that work
at different speeds. One of the control lines usually carries the processor clock because
all operations in the machine are linked to this clock.
The fastest anything happens within the machine is within one clock pulse. So when
the processor wants to write to the memory or read from the memory it takes one clock
pulse.
Well, no not usually!
The problem is that processor chips are made to high cost, high speed, designs.
Memory components are usually made to lower cost, slower, designs.
Why?
Simply because there is only one processor to be bought and paid for but lots and lots
of memory chips. What this means in practice is that for quite a long time processors
have been able to work much faster than memory.
There was a brief period back in the early days when processor chips ran at a clock rate
of 1MHz and the memory chips could keep up. As soon as the PC and the second
generation processor chips appeared things became more complicated.
The memory used, DRAM, needed more than one processor clock pulse time to store
and retrieve data and the “wait state” was born. The processor would put the address
onto the address bus and the data on the data bus, signal to the memory that it was
ready to write data and then it would sit there for one, two, possibly more, clock pulses
doing nothing at all until the memory had enough time to store the data. It could then
move on to the next instruction.
Waiting for RAM
As you can imagine there was a time when wait states were a big selling point, or
rather a non-selling point. The fewer the wait states a machine needed the faster it
would run your program but the more money it would cost.
Dell, it is rumoured was even so worried about it they built a 386SX 16MHz machine
using expensive static RAM that was fast enough to keep up. Very expensive and as
will be explained, quite unnecessary - but an interesting experiment.
Processor clock speeds rocketed from 1MHz, through 4MHz, hit 16MHz and carried
on up to today’s maximum of around 4GHz. There is absolutely no way that memory
chips could keep up with this sort of amazing speed and be cheap enough to supply
the ever increasing amounts of storage needed.
There are a number of ingenious intermediate solutions that boost memory thoughput
but there is only one really great new idea that solves most of the problems.

The solution has in fact been known for some time and was implemented in
mainframe and some minicomputer systems. Its working was reasonably well
understood but it was fairly complicated and it took time for it to be built into the
evolving microcomputer systems.
The principle in question was that of using “cache” RAM.
DESIGN ELEMENTS AND PRINCIPLES OF CACHE DESIGN
The key elements of cache design are:
Cache Addresses
Almost all non-embedded processors, and many embedded processors, support virtual
memory. In essence, virtual memory is a facility that allows programs to address
memory from a logical point of view, without regard to the amount of main memory
physically available. When virtual memory is used, the address fields of machine
instructions contain virtual addresses. For reads to and writes from main memory, a

hardware memory management unit (MMU) translates each virtual address into a
physical address in main memory.
When virtual addresses are used, the system designer may choose to place the cache
between the processor and the MMU or between the MMU and main memory. A
logical cache, also known as a virtual cache, stores data using virtual addresses. The
processor accesses the cache directly, without going through the MMU. A physical
cache stores data using main memory physical addresses.
One obvious advantage of the logical cache is that cache access speed is faster than for
a physical cache, because the cache can respond before the MMU performs an address
translation. The disadvantage has to do with the fact that most virtual memory
systems supply each application with the same virtual memory address space. That is,
each application sees a virtual memory that starts at address 0. Thus, the same virtual
address in two different applications refers to two different physical addresses. The
cache memory must therefore be completely flushed with each application context
switch, or extra bits must be added to each line of the cache to identify which virtual
address space this address refers to.

REPORT THIS AD
Cache Size
The size of the cache should be small enough so that the overall average cost per bit is
close to that of main memory alone and large enough so that the overall average
access time is close to that of the cache alone. There are several other motivations for
minimizing cache size. The larger the cache, the larger the number of gates involved in
addressing the cache. The result is that large caches tend to be slightly slower than
small ones-even when built with the same integrated circuit technology and put in
the same place on chip and circuit board. The available chip and board area also
limits cache size. Because the performance of the cache is very sensitive to the nature
of the workload, it is impossible to arrive at a single “optimum” cache size.

Mapping Function
Because there are fewer cache lines than main memory blocks, an algorithm is needed
for mapping main memory blocks into cache lines. Further, a means is needed for
determining which main memory block currently occupies a cache line. The choice of
the mapping function dictates how the cache is organized. Three techniques can be
used: direct, associative, and set associative.
Direct Mapping. The simplest technique, known as direct mapping, maps each block
of main memory into only one possible cache line. The mapping is expressed as,
i = j modulo m
where,
i = cache line number

j = main memory block number
m = number of lines in the cache
Figure below shows the mapping for the first m blocks of main memory.

Each block of main memory maps into one unique line of the cache.
REPORT THIS AD
The mapping function is easily implemented using the main memory address. Figure
below illustrates the general mechanism.
For purposes of cache access, each main memory address can be viewed as consisting
of three fields. The least significant w bits identify a unique word or byte within a
block of main memory; in most contemporary machines, the address is at the byte
level. The remaining s bits specify one of the 2^s blocks of main memory. The cache
logic interprets these s bits as a tag of s – r bits (most significant portion) and a line
field of r bits. This latter field identifies one of the m = 2^r lines of the cache. To
summarize,

The effect of this mapping is that blocks of main memory are assigned to lines of the
cache as follows:
Thus, the use of a portion of the address as a line number provides a unique mapping
of each block of main memory into the cache. When a block is actually read into its
assigned line, it is necessary to tag the data to distinguish it from other blocks that can
fit into that line. The most significant s – r bits serve this purpose.
REPORT THIS AD
The direct mapping technique is simple and inexpensive to implement. Its main
disadvantage is that there is a fixed cache location for any given block. Thus, if a
program happens to reference words repeatedly from two different blocks that map
into the same line, then the blocks will be continually swapped in the cache, and the hit
ratio will be low (a phenomenon known as thrashing).
One approach to lower the miss penalty is to remember what was discarded in case it
is needed again. Since the discarded data has already been fetched, it can be used again
at a small cost. Such recycling is possible using a victim cache. Victim cache was

originally proposed as an approach to reduce the conflict misses of direct
mapped caches without affecting its fast access time. Victim cache is a fully associative
cache, whose size is typically 4 to 16 cache lines, residing between a direct mapped L1
cache and the next level of memory.
Associative Mapping. Associative mapping overcomes the disadvantage of direct

mapping by permitting each main memory block to be loaded into any line of the
cache.
In this case, the cache control logic interprets a memory address simply as a Tag and a
Word field. The Tag field uniquely identifies a block of main memory. To determine
whether a block is in the cache, the cache control logic must simultaneously examine
every line’s tag for a match. Figure below illustrates the logic.

Note that no field in the address corresponds to the line number, so that the number of
lines in the cache is not determined by the address format. To summarize,
REPORT THIS AD
With associative mapping, there is flexibility as to which block to replace when a new
block is read into the cache. The principal disadvantage of associative mapping is the
complex circuitry required to examine the tags of all cache lines in parallel.
Set-Associative Mapping. Set-associative mapping is a compromise that exibits the

strengths of both the direct and associative approaches while reducing their

disadvantages. In this case, the cache consists of a number sets, each of which consists
of a number of lines. The relationships are
m=n*k
i = j modulo n
where,
i = cache set number

j = main memory block number
m = number of lines in the cache
n = number of sets
k = number of lines in each set
This is referred to as k-way set-associative mapping. With set-associative mapping,

block Bj can be mapped into any of the lines of set j. Figure below illustrates this
mapping for the first v blocks of main memory.

For set-associative mapping, each word maps into all the cache lines in a specific set, so
that main memory block B0 maps into set 0, and so on. Thus, the set-associative cache
can be physically implemented as n associative caches. It is also possible to implement
the set-associative cache as k direct mapping caches, as shown in figure below.
REPORT THIS AD
Each direct-mapped cache is referred to as a way, consisting of v lines. The first v lines
of main memory are direct mapped into the n lines of each way; the next group of v
lines of main memory are similarly mapped, and so on. The direct-mapped
implementation is typically used for small degrees of associativity (small values of k)
while the associative-mapped implementation is typically used for higher degrees of
associativity.
For set-associative mapping, the cache control logic interprets a memory address as
three fields: Tag, Set, and Word. The d set bits specify one of v=2^d sets. The s bits of
the Tag and Set fields specify one of the 2 s blocks of main memory. Figure below
illustrates the cache control logic.

With fully associative mapping, the tag in a memory address is quite large and must be
compared to the tag of every line in the cache. With k-way set-associative mapping, the
tag in a memory address is much smaller and is only compared to the k tags within a
single set. To summarize,

Replacement Algorithms
Once the cache has been filled, when a new block is brought into the cache, one of the
existing blocks must be replaced. For direct mapping, there is only one possible line for
any particular block, and no choice is possible. For the associative and setassociative
techniques, a replacement algorithm is needed. To achieve high speed, such an
algorithm must be implemented in hardware.
Probably the most effective is least recently used (LRU): Replace that block in the set
that has been in the cache longest with no reference to it. For two-way set associative,
this is easily implemented. Each line includes a USE bit. When a line is referenced, its
USE bit is set to 1 and the USE bit of the other line in that set is set to 0. When a block is
to be read into the set, the line whose USE bit is 0 is used. Because we are assuming
that more recently used memory locations are more likely to be referenced, LRU
should give
the best hit ratio. LRU is also relatively easy to implement for a fully associative cache.
The cache mechanism maintains a separate list of indexes to all the lines in the cache.
When a line is referenced, it moves to the front of the list. For replacement, the line at
the back of the list is used. Because of its simplicity of implementation, LRU is the most
popular replacement algorithm.
Another possibility is first-in-first-out (FIFO): Replace that block in the set that has
been in the cache longest. FIFO is easily implemented as a round-robin or
circular buffer technique. Still another possibility is least frequently used (LFU):
Replace that block in the set that has experienced the fewest references. LFU could be
implemented by associating a counter with each line. A technique not based on usage
(i.e., not LRU, LFU, FIFO, or some variant) is to pick a line at random from among
the candidate lines. Simulation studies have shown that random replacement
provides only slightly inferior performance to an algorithm based on usage.

Write Policy
When a block that is resident in the cache is to be replaced, there are two cases
to consider.
1) If the old block in the cache has not been altered, then it may be overwritten with a
new block without first writing out the old block.
2) If at least one write operation has been performed on a word in that line of the
cache, then main memory must be updated by writing the line of cache out to the block
of memory before bringing in the new block.
A variety of write policies, with performance and economic trade-offs, is possible.

There are two problems to contend with.
1) More than one device may have access to main memory. For example, an I/O
module may be able to read-write directly to memory. If a word has been altered only
in the cache, then the corresponding memory word is invalid. Further, if the I/O device
has altered main memory, then the cache word is invalid.
2) A more complex problem occurs when multiple processors are attached to the same
bus and each processor has its own local cache. Then, if a word is altered in one cache,
it could conceivably in-validate a word in other caches.
The simplest technique is called write through. Using this technique, all
write operations are made to main memory as well as to the cache, ensuring that
main memory is always valid. Any other processor–cache module can monitor traffic
to main memory to maintain consistency within its own cache. The main
disadvantage of this technique is that it generates substantial memory traffic and may
create a bottleneck.

An alternative technique, known as write back, minimizes memory writes. With write
back, updates are made only in the cache. When an update occurs, a dirty bit, or use
bit, associated with the line is set. Then, when a block is replaced, it is written back to
main memory if and only if the dirty bit is set. The problem with write back is that
portions of main memory are invalid, and hence accesses by I/O modules can be
allowed only through the cache. This makes for complex circuitry and a potential
bottleneck.
In a bus organization in which more than one device (typically a processor) has a cache
and main memory is shared, a new problem is introduced. If data in one cache are
altered, this invalidates not only the corresponding word in main memory, but also
that same word in other caches (if any other cache happens to have that same word).
Even if a write-through policy is used, the other caches may contain invalid data. A
system that prevents this problem is said to maintain cache coherency. Possible
approaches to cache coherency include the following:
1) Bus watching with write through: Each cache controller monitors the address lines
to detect write operations to memory by other bus masters. If another master writes to
a location in shared memory that also resides in the cache memory, the cache controller
invalidates that cache entry. This strategy depends on the use of a write-through policy
by all cache controllers.
2) Hardware transparency: Additional hardware is used to ensure that all updates to
main memory via cache are reflected in all caches. Thus, if one processor modifies a
word in its cache, this update is written to main memory. In addition, any matching
words in other caches are similarly updated.
3) Non-cacheable memory: Only a portion of main memory is shared by more than
one processor, and this is designated as noncacheable. In such a system, all accesses to
shared memory are cache misses, because the shared memory is never copied into the
cache. The noncacheable memory can be identified using chip-select logic or high-
address bits.
REPORT THIS AD
Line Size
Another design element is the line size. When a block of data is retrieved and placed in
the cache, not only the desired word but also some number of adjacent words
are retrieved. As the block size increases from very small to larger sizes, the hit ratio
will
at first increase because of the principle of locality, which states that data in the vicinity
of a referenced word are likely to be referenced in the near future. As the block size
increases, more useful data are brought into the cache. The hit ratio will begin to
decrease, however, as the block becomes even bigger and the probability of using the
newly fetched information becomes less than the probability of reusing the
information that has to be replaced. Two specific effects come into play:
1) Larger blocks reduce the number of blocks that fit into a cache. Because each block
fetch overwrites older cache contents, a small number of blocks results in data being
overwritten shortly after they are fetched.
2) As a block becomes larger, each additional word is farther from the requested word
and therefore less likely to be needed in the near future.
The relationship between block size and hit ratio is complex, depending on the locality
characteristics of a particular program, and no definitive optimum value has been
found.
Number of Caches
When caches were originally introduced, the typical system had a single cache.
More recently, the use of multiple caches has become the norm. Two aspects of this
design issue concern the number of levels of caches and the use of unified versus split
caches.
Multilevel Caches. As logic density has increased, it has become possible to have a
cache on the same chip as the processor: the on-chip cache. Compared with a cache
reachable via an external bus, the on-chip cache reduces the processor’s external bus
activity and therefore speeds up execution times and increases overall system
performance. When the requested instruction or data is found in the on-chip cache, the
bus access is eliminated. Because of the short data paths internal to the processor,
compared with bus lengths, on-chip cache accesses will complete appreciably faster
than would even zero-wait state bus cycles. Furthermore, during this period the bus is
free to support other transfers.
The inclusion of an on-chip cache leaves open the question of whether an off-chip, or
external, cache is still desirable. Typically, the answer is yes, and most contemporary
designs include both on-chip and external caches. The simplest such organization is
known as a two-level cache, with the internal cache designated as level 1 (L1) and the
external cache designated as level 2 (L2). The reason for including an L2 cache is the
following: If there is no L2 cache and the processor makes an access request for a
memory location not in the L1 cache, then the processor must access DRAM or ROM
memory across the bus. Due to the typically slow bus speed and slow memory access
time, this results in poor performance. On the other hand, if an L2 SRAM (static RAM)
cache is used, then frequently the missing information can be quickly retrieved. If the
SRAM is fast enough to match the bus speed, then the data can be accessed using a
zero-wait state transaction, the fastest type of bus transfer.
REFERENCES:
https://www.i-programmer.info/babbages-bag/375-cache-memory.html
https://www.informationq.com/computer-memory-overview/
https://quickcse.wordpress.com/2018/08/12/elements-of-cache-design/

Computer Memory System Overview, Cache Memory, Design Elements and Priciples of Cache Design

Uploaded by

Computer Memory System Overview, Cache Memory, Design Elements and Priciples of Cache Design

Uploaded by

WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and

Principles of Cache Design

COMPUTER MEMOMY SYSTEM OVERVIEW

The computer uses a hierarchy of memory that is organised in a manner to

1|Page Computer Architecture and Organization

In general, the memory is classified into two categories.

1. Primary memory or Main memory.

Primary Memory(Main Memory):

Primary memory further classified in two categories.

There are two categories of RAM

3|Page Computer Architecture and Organization

Cache memory is very expensive, so it is smaller in size. Generally. computers have

2. Read Only Memory (ROM):

There are three categories of ROM.

4|Page Computer Architecture and Organization

(iii) Electrically Erasable Programmable ROM (EEPROM) It is similar to EPROM, but

 Flash Memory is a kind of semiconductor-based nonvolatile rewritable memory.

Secondary Memory (Auxiliary Memory/Storage Devices)

5|Page Computer Architecture and Organization

Secondary memory devices include.

 Hard Disk Drive.

Solid State Disks:

Hard Disk Drive (HDD):

7|Page Computer Architecture and Organization

 3 1/4 inch floppy disk has a capacity of 1.44 MB.

Compact Disk (CD):

CDs are categorised into three main types.

1. CD-ROM(Compact Disk – Read Only Memory) It is designed to store computer

Digital Video Disk (DVD):

8|Page Computer Architecture and Organization

DVDs come in three varieties.

9|Page Computer Architecture and Organization

The variations in the formats are as follows

10 | P a g e Computer Architecture and Organization

11 | P a g e Computer Architecture and Organization

Types of Magnetic Tape:

1 Floppy Disk (5.25 inch) Magnetic 1.2 MB

2 Floppy Disk (3.5 inch) Magnetic 1.44 MB

3 Floppy Disk (8 inch) Magnetic 80 KB to 242 KB

4 Hard Disk Magnetic Up to 1 TB

6 DVD-ROM Optical 4.7 GB to 17 GB

7 Pen-Drive Solid State 1 GB to 512 GB

8 Magnetic Tape Solid State Up to 1 TB

Bit It stands for a Binary Digit. Which is either 0 or 1.

Kilobyte (KB) In memory. a group of 1024 bytes is called a Kilobyte.

Megabyte (MB) In memory. a group of 1024 Kilobytes is called a Megabyte. It is

Gigabyte (GB) In memory , a group of 1024 megabytes is called a Gigabyte. It is

Petabyte (PB) one petabyte of information equal to 1000 terabytes or 10 bytes.

Exabyte (EB) One Exabyte of information equal to 1000 petabytes or 10 bytes.

Zettabyte (ZB) One zettabyte of information equal to 1000 exabytes or 10 bytes.

Units of computer memory measurements.

1Bit = Binary Digit

14 | P a g e Computer Architecture and Organization

 Bit is the smallest memory measurement unit.

What happens when the processor wants to use memory?

15 | P a g e Computer Architecture and Organization

16 | P a g e Computer Architecture and Organization

Waiting for RAM

Well, no not usually!

Waiting for RAM

18 | P a g e Computer Architecture and Organization

The principle in question was that of using “cache” RAM.

DESIGN ELEMENTS AND PRINCIPLES OF CACHE DESIGN

The key elements of cache design are:

19 | P a g e Computer Architecture and Organization

20 | P a g e Computer Architecture and Organization

21 | P a g e Computer Architecture and Organization

i = cache line number

22 | P a g e Computer Architecture and Organization

23 | P a g e Computer Architecture and Organization

24 | P a g e Computer Architecture and Organization

Associative Mapping. Associative mapping overcomes the disadvantage of direct

25 | P a g e Computer Architecture and Organization

Set-Associative Mapping. Set-associative mapping is a compromise that exibits the

26 | P a g e Computer Architecture and Organization

i = cache set number