Computer Memory System Overview, Cache Memory, Design Elements and Priciples of Cache Design
Computer Memory System Overview, Cache Memory, Design Elements and Priciples of Cache Design
Objectives:
At the end of the lesson the learner will be able to:
Define computer memory system
Identify the different types of memory
Identify the types of cache memory
Understand the design elements and principles of cache design
The computer memory is one of the most important elements in a computer system. It
stores data and instructions required during the processing of data and output results.
Storage may be required for a limited period of time, instantly or for an extended
period of time. Computer memory refers to the electronic holding place for
instructions and data where the processor can read quickly.
Memory Hierarchy
The memory is characterised on the basis of two key factors; capacity and access
time. The lesser the access time, the faster is the speed of memory.
Parameters of Memory:
The following terms are most commonly used for identifying comparative behaviour
of various memory devices and technologies.
Storage Capacity It is representative of the size of the memory. The capacity of internal
memory and main memory can be expressed in terms of number of words or bytes.
Access Modes A memory is comprised of various memory locations The information
from these memory locations can be accessed randomly. sequentially and directly.
Access Time The access time is the time required between the desired modes for a read
or write operation till the data is made available or written at the desired location.
Physical Characteristics In this respect. the devices can be categorised into four main
categories electronic, magnetic, mechanical and optical.
Permanence of Storage Its permanence is high for future use in magnetic materials.
The memory unit that communicates directly with the CPU is called main memory.
The primary memory allows the computer to store data for immediate manipulation
and to keep track of what is currently being processed. It is volatile in nature, it means
that when the power is turned off, the contents of the primary memory are lost forever.
It is also known as read/write memory, that allows CPU to read as well as write data
and instructions into it.
RAM is used for the temporary storage of input data, output data and intermediate
results. RAM is a microchip implemented using semiconductors.
(ii) Static RAM (SRAM) It retains the data as long as power is provided to the memory
chip. It needs not be ‘refreshed’ periodically. SRAM uses multiple transistors for each
memory cell. It does not use capacitor. SRAM is often used as cache memory due to its
high speed. SRAM is more expensive than DRAM.
Extended Data Output Dynamic RAM (EDO DRAM) It is a type of RAM chip. It is
used to improve the time to read content from memory and enhance the method of
access.
The data and instructions that are required during the processing of data are brought
from the secondary storage devices and stored in the RAM. For processing it is
required that the data and instructions are accessed from the RAM and stored in the
registers.
Cache memory is a very high speed memory placed in between RAM and CPU. Cache
memory increases the speed of processing.
It is also known as non-volatile memory or permanent storage. It does not lose its
content when the power is switched off.ROM has only read capability, no write
capability. ROM can have data and instructions written to it only one time. Once a
ROM chip is programmed at the time of manufacturing, it cannot be reprogrammed or
rewritten.
(ii) Erasable Programmable ROM (EPROM) It is similar to PROM, but it can be erased
by exposure to strong ultraviolet light,then rewritten. So,it is also known as Ultraviolet
Erasable Programmable ROM (UV EPROM).EPROM was invented by Dov Frohman of
Intel in 1971.
Tit-Bits
The secondary memory stores much larger amounts of data and information for
extended periods of time. Data in secondary memory cannot be processed directly by
the CPU, it must first be copied into primary storage i.e…, RAM.
Secondary storage is used to store data and programs when they are not being
processed. It is also non-volatile in nature. Due to this, the data remain in the
Optical Disk:
6|Page Computer Architecture and Organization
WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and
Principles of Cache Design
CD
DVD
Blue-ray Disk
Pen/Flash Drive
It is a non-volatile, random access digital data storage device. It is a data storage device
used for storing and retrieving digital information using rotating disks (platters)
coated with magnetic material. All programs of a computer are installed in hard disk.
It consists of a spindle that hold non-magnetic flat circular disks, called platters, which
hold the recorded data. Each platter requires two read/write heads, that is used to
write and read the information from a platter. All the read/ write heads are attached to
a single access arm so that they cannot move independently.
The information is recorded in bands; each band of information is called a track. Each
platter has the same number of tracks and a track location that cuts across all platters is
called a cylinder. The tracks are divided into pie- shaped sections known as sectors.
Floppy Disk:
It is used to store data but it can store small amount of data and it is slower to access
than hard disks. It is round in shape and a thin plastic disk coated with iron oxide.
Data is retrieved or recorded on the surface of the disk through a slot on the envelope.
Floppy disks is removable from the drive. Floppy disk is available in three sizes; 8
inch,5 1/4 inch and 3 1/2 inch.
It is the most popular and least expensive type of optical disk. A CD is capable of being
used as a data storage device along with storing of digital audio. The files are stored on
this particular contiguous sectors.
DVDs offer higher storage capacity than Compact discs while having the same
dimensions.
Depending upon the disk type, DVD can store several Gigabytes of data (4.7 GB -17.08
GB) DVDs are primarily used to store music or 6 movies and can be played back on
your television or the computer too. They are not rewritable media.
Blue-ray Disk:
Blue-ray disk (official abbreviation BD) is an optical disk storage medium designed to
recapture the data normally in DVD format. Blu-ray discs contain 25 GB(23.31 GB) Per
Layer space.
The name Blue-ray disk refers to the blue laser used to read the disk, which allows
information to be stored at a greater density than the longer-wavelength red laser used
is DVDs.
Blu-ray can hold almost 5 times more data than a single layer DVD.
Pen/Thumb Drive:
Pen drive is also known as flash drive. As flash drive is a data storage device that
consists of flash memory (key memory) with a portable USB (universal Serial Bus)
interface. USB flash drives are typically removable, re writable and much smaller than
a floppy disk. A USB flash drive is same as the size of thumb that plugs into a USB port
on the computer.
Today, flash drives are available in various storage capacities as 256MB, 512MB, 1GB,
4GB, 16GB upto 64GB. They are widely used as an easy and small medium to transfer
and store the information from their computer.
It is a USB-based flash memory drive. A family of flash memory cards from sony
designed for digital storage in cameras. camcorders and other handheld devices.
Capacity of memory stick varies from 4 MB to 256GB.
Magnetic Tape
Magnetic tapes are made of a plastic film-type material coated with magnetic materials
to store data permanently.Data can be read as well as recorded. It is usually 12.5 mm to
25 mm wide and 500 m to 1200 m long. These can store data in a sequential manner.
The data stored in magnetic tape is in the form of tiny segments of magnetised and
demagnetised portion on the surface of the material. Magnetic tapes are durable, can
be written. erased and rewritten. Magnetic tapes hold the maximum data, which can
be accessed sequentially.
There are mainly two types of magnetic tape as Tape Reel and Tape Cassette. Each of
the type has its own requirements. The older systems designed for networks use reel-
to-reel tapes. Newer systems use cassettes holding more data than that of the huge
reels.
Tit-Bits:
The rate at which data is written to disk or read from disk is called data transfer
rate.
Track it records data bits as tiny magnetic spots.
Sector it holds a block of data that is read or written at one time.
Root directory. is the main folder of disk . it contains information about all folders
on the disk.
Hard disk is a fixed disk i.e., cannot be removed from the drive.
12 | P a g e Computer Architecture and Organization
WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and
Principles of Cache Design
Secondary Memory Device and their Storage Method and Capacity.
Storage
S.No Secondary Memory Device Method Capacity
640 MB to 680
5 CD-ROM Optical MB
Memory Measurement:
When you use a RAM,ROM. Floppy disk or hard disk. the data is measured using
some unit. In computer terminology. They are called nibble. Bit, Byte, Kilobyte,
Megabyte, Gigabyte, etc.
Byte (B) A byte is approximately one character (letter ’a’. number ‘1’. Symbol’?’. etc…).
Also. a group of 8 bits is called a byte.
13 | P a g e Computer Architecture and Organization
WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and
Principles of Cache Design
Nibble 4 bits make one nibble.
Terabyte (TB) A terabyte , exactly 2 bytes (2 GB).is approximately a trillion (10 ) bytes.
CACHE MEMORY
Before getting on with the main topic let's have a quick refresher of how memory
systems work - skip to "Waiting for RAM" if you already know about addresses, data
and control buses.
Back in the early days of computing things were simple. There was the processor and
the memory. The processor stored data in the memory when it wanted to and read the
data back when it wanted to.
Let’s just consider RAM because the only difference between it and ROM is that the
processor cannot write data to ROM. The processor has a connection to the memory
that allows it to communicate the data being stored or retrieved which consists of a
wire for each bit of the address – making an address “bus”.
You will also know that the number of memory locations that the processor can
address depends on the number of address lines in the bus - each additional address
line doubles the amount of memory that can be used.
The processor also has a data bus, which it uses to send and retrieve data to and from
the RAM. Again the number of wires in the data bus is the number of bits that a single
memory location stores and the number of bits transferred in a single memory
operation.
So far, so good, and in fact nothing much more than was explained about general
memory systems in How Memory Works. However, as well as the address and data
bus there also has to be a control or system bus.
This passes signals to the memory that control exactly what is happening. For example,
there is usually a Read/Write (R/W) line, which indicates the direction in which the
data bus is operating and whether the memory should read the data on the data bus or
use the stored data to set the state of the data bus.
Memory
There is also usually a control signal that tells the processor that the data on the data
bus is valid and so on. The exact arrangement of the control bus varies from processor
to processor but you can see what sort of things it has to deal with.
Now we come to the interesting part – computers are made of components that work
at different speeds. One of the control lines usually carries the processor clock because
all operations in the machine are linked to this clock.
The fastest anything happens within the machine is within one clock pulse. So when
the processor wants to write to the memory or read from the memory it takes one clock
pulse.
The problem is that processor chips are made to high cost, high speed, designs.
Memory components are usually made to lower cost, slower, designs.
Why?
Simply because there is only one processor to be bought and paid for but lots and lots
of memory chips. What this means in practice is that for quite a long time processors
have been able to work much faster than memory.
There was a brief period back in the early days when processor chips ran at a clock rate
of 1MHz and the memory chips could keep up. As soon as the PC and the second
generation processor chips appeared things became more complicated.
The memory used, DRAM, needed more than one processor clock pulse time to store
and retrieve data and the “wait state” was born. The processor would put the address
onto the address bus and the data on the data bus, signal to the memory that it was
17 | P a g e Computer Architecture and Organization
WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and
Principles of Cache Design
ready to write data and then it would sit there for one, two, possibly more, clock pulses
doing nothing at all until the memory had enough time to store the data. It could then
move on to the next instruction.
As you can imagine there was a time when wait states were a big selling point, or
rather a non-selling point. The fewer the wait states a machine needed the faster it
would run your program but the more money it would cost.
Dell, it is rumoured was even so worried about it they built a 386SX 16MHz machine
using expensive static RAM that was fast enough to keep up. Very expensive and as
will be explained, quite unnecessary - but an interesting experiment.
Processor clock speeds rocketed from 1MHz, through 4MHz, hit 16MHz and carried
on up to today’s maximum of around 4GHz. There is absolutely no way that memory
chips could keep up with this sort of amazing speed and be cheap enough to supply
the ever increasing amounts of storage needed.
There are a number of ingenious intermediate solutions that boost memory thoughput
but there is only one really great new idea that solves most of the problems.
Cache Addresses
Almost all non-embedded processors, and many embedded processors, support virtual
memory. In essence, virtual memory is a facility that allows programs to address
memory from a logical point of view, without regard to the amount of main memory
physically available. When virtual memory is used, the address fields of machine
instructions contain virtual addresses. For reads to and writes from main memory, a
When virtual addresses are used, the system designer may choose to place the cache
between the processor and the MMU or between the MMU and main memory. A
logical cache, also known as a virtual cache, stores data using virtual addresses. The
processor accesses the cache directly, without going through the MMU. A physical
cache stores data using main memory physical addresses.
One obvious advantage of the logical cache is that cache access speed is faster than for
a physical cache, because the cache can respond before the MMU performs an address
translation. The disadvantage has to do with the fact that most virtual memory
systems supply each application with the same virtual memory address space. That is,
each application sees a virtual memory that starts at address 0. Thus, the same virtual
address in two different applications refers to two different physical addresses. The
cache memory must therefore be completely flushed with each application context
switch, or extra bits must be added to each line of the cache to identify which virtual
address space this address refers to.
REPORT THIS AD
Cache Size
The size of the cache should be small enough so that the overall average cost per bit is
close to that of main memory alone and large enough so that the overall average
access time is close to that of the cache alone. There are several other motivations for
minimizing cache size. The larger the cache, the larger the number of gates involved in
addressing the cache. The result is that large caches tend to be slightly slower than
small ones-even when built with the same integrated circuit technology and put in
the same place on chip and circuit board. The available chip and board area also
limits cache size. Because the performance of the cache is very sensitive to the nature
of the workload, it is impossible to arrive at a single “optimum” cache size.
Because there are fewer cache lines than main memory blocks, an algorithm is needed
for mapping main memory blocks into cache lines. Further, a means is needed for
determining which main memory block currently occupies a cache line. The choice of
the mapping function dictates how the cache is organized. Three techniques can be
used: direct, associative, and set associative.
Direct Mapping. The simplest technique, known as direct mapping, maps each block
of main memory into only one possible cache line. The mapping is expressed as,
i = j modulo m
where,
Figure below shows the mapping for the first m blocks of main memory.
REPORT THIS AD
The mapping function is easily implemented using the main memory address. Figure
below illustrates the general mechanism.
For purposes of cache access, each main memory address can be viewed as consisting
of three fields. The least significant w bits identify a unique word or byte within a
block of main memory; in most contemporary machines, the address is at the byte
level. The remaining s bits specify one of the 2^s blocks of main memory. The cache
logic interprets these s bits as a tag of s – r bits (most significant portion) and a line
field of r bits. This latter field identifies one of the m = 2^r lines of the cache. To
summarize,
The effect of this mapping is that blocks of main memory are assigned to lines of the
cache as follows:
Thus, the use of a portion of the address as a line number provides a unique mapping
of each block of main memory into the cache. When a block is actually read into its
assigned line, it is necessary to tag the data to distinguish it from other blocks that can
fit into that line. The most significant s – r bits serve this purpose.
REPORT THIS AD
The direct mapping technique is simple and inexpensive to implement. Its main
disadvantage is that there is a fixed cache location for any given block. Thus, if a
program happens to reference words repeatedly from two different blocks that map
into the same line, then the blocks will be continually swapped in the cache, and the hit
ratio will be low (a phenomenon known as thrashing).
One approach to lower the miss penalty is to remember what was discarded in case it
is needed again. Since the discarded data has already been fetched, it can be used again
at a small cost. Such recycling is possible using a victim cache. Victim cache was
In this case, the cache control logic interprets a memory address simply as a Tag and a
Word field. The Tag field uniquely identifies a block of main memory. To determine
whether a block is in the cache, the cache control logic must simultaneously examine
every line’s tag for a match. Figure below illustrates the logic.
Note that no field in the address corresponds to the line number, so that the number of
lines in the cache is not determined by the address format. To summarize,
REPORT THIS AD
With associative mapping, there is flexibility as to which block to replace when a new
block is read into the cache. The principal disadvantage of associative mapping is the
complex circuitry required to examine the tags of all cache lines in parallel.
m=n*k
i = j modulo n
where,
REPORT THIS AD
Each direct-mapped cache is referred to as a way, consisting of v lines. The first v lines
of main memory are direct mapped into the n lines of each way; the next group of v
lines of main memory are similarly mapped, and so on. The direct-mapped
implementation is typically used for small degrees of associativity (small values of k)
while the associative-mapped implementation is typically used for higher degrees of
associativity.
For set-associative mapping, the cache control logic interprets a memory address as
three fields: Tag, Set, and Word. The d set bits specify one of v=2^d sets. The s bits of
the Tag and Set fields specify one of the 2 s blocks of main memory. Figure below
illustrates the cache control logic.
With fully associative mapping, the tag in a memory address is quite large and must be
compared to the tag of every line in the cache. With k-way set-associative mapping, the
tag in a memory address is much smaller and is only compared to the k tags within a
single set. To summarize,
Once the cache has been filled, when a new block is brought into the cache, one of the
existing blocks must be replaced. For direct mapping, there is only one possible line for
any particular block, and no choice is possible. For the associative and setassociative
techniques, a replacement algorithm is needed. To achieve high speed, such an
algorithm must be implemented in hardware.
Probably the most effective is least recently used (LRU): Replace that block in the set
that has been in the cache longest with no reference to it. For two-way set associative,
this is easily implemented. Each line includes a USE bit. When a line is referenced, its
USE bit is set to 1 and the USE bit of the other line in that set is set to 0. When a block is
to be read into the set, the line whose USE bit is 0 is used. Because we are assuming
that more recently used memory locations are more likely to be referenced, LRU
should give
the best hit ratio. LRU is also relatively easy to implement for a fully associative cache.
The cache mechanism maintains a separate list of indexes to all the lines in the cache.
When a line is referenced, it moves to the front of the list. For replacement, the line at
the back of the list is used. Because of its simplicity of implementation, LRU is the most
popular replacement algorithm.
Another possibility is first-in-first-out (FIFO): Replace that block in the set that has
been in the cache longest. FIFO is easily implemented as a round-robin or
circular buffer technique. Still another possibility is least frequently used (LFU):
Replace that block in the set that has experienced the fewest references. LFU could be
implemented by associating a counter with each line. A technique not based on usage
(i.e., not LRU, LFU, FIFO, or some variant) is to pick a line at random from among
the candidate lines. Simulation studies have shown that random replacement
provides only slightly inferior performance to an algorithm based on usage.
When a block that is resident in the cache is to be replaced, there are two cases
to consider.
1) If the old block in the cache has not been altered, then it may be overwritten with a
new block without first writing out the old block.
2) If at least one write operation has been performed on a word in that line of the
cache, then main memory must be updated by writing the line of cache out to the block
of memory before bringing in the new block.
1) More than one device may have access to main memory. For example, an I/O
module may be able to read-write directly to memory. If a word has been altered only
in the cache, then the corresponding memory word is invalid. Further, if the I/O device
has altered main memory, then the cache word is invalid.
2) A more complex problem occurs when multiple processors are attached to the same
bus and each processor has its own local cache. Then, if a word is altered in one cache,
it could conceivably in-validate a word in other caches.
The simplest technique is called write through. Using this technique, all
write operations are made to main memory as well as to the cache, ensuring that
main memory is always valid. Any other processor–cache module can monitor traffic
to main memory to maintain consistency within its own cache. The main
disadvantage of this technique is that it generates substantial memory traffic and may
create a bottleneck.
In a bus organization in which more than one device (typically a processor) has a cache
and main memory is shared, a new problem is introduced. If data in one cache are
altered, this invalidates not only the corresponding word in main memory, but also
that same word in other caches (if any other cache happens to have that same word).
Even if a write-through policy is used, the other caches may contain in- valid data. A
system that prevents this problem is said to maintain cache coherency. Possible
approaches to cache coherency include the following:
1) Bus watching with write through: Each cache controller monitors the address lines
to detect write operations to memory by other bus masters. If another master writes to
a location in shared memory that also resides in the cache memory, the cache controller
invalidates that cache entry. This strategy depends on the use of a write-through policy
by all cache controllers.
2) Hardware transparency: Additional hardware is used to ensure that all updates to
main memory via cache are reflected in all caches. Thus, if one processor modifies a
word in its cache, this update is written to main memory. In addition, any matching
words in other caches are similarly updated.
3) Non-cacheable memory: Only a portion of main memory is shared by more than
one processor, and this is designated as noncacheable. In such a system, all accesses to
shared memory are cache misses, because the shared memory is never copied into the
cache. The noncacheable memory can be identified using chip-select logic or high-
address bits.
32 | P a g e Computer Architecture and Organization
WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and
Principles of Cache Design
REPORT THIS AD
Line Size
Another design element is the line size. When a block of data is retrieved and placed in
the cache, not only the desired word but also some number of adjacent words
are retrieved. As the block size increases from very small to larger sizes, the hit ratio
will
at first increase because of the principle of locality, which states that data in the vicinity
of a referenced word are likely to be referenced in the near future. As the block size
increases, more useful data are brought into the cache. The hit ratio will begin to
decrease, however, as the block becomes even bigger and the probability of using the
newly fetched information becomes less than the probability of reusing the
information that has to be replaced. Two specific effects come into play:
1) Larger blocks reduce the number of blocks that fit into a cache. Because each block
fetch overwrites older cache contents, a small number of blocks results in data being
overwritten shortly after they are fetched.
2) As a block becomes larger, each additional word is farther from the requested word
and therefore less likely to be needed in the near future.
The relationship between block size and hit ratio is complex, depending on the locality
characteristics of a particular program, and no definitive optimum value has been
found.
Number of Caches
When caches were originally introduced, the typical system had a single cache.
More recently, the use of multiple caches has become the norm. Two aspects of this
design issue concern the number of levels of caches and the use of unified versus split
caches.
33 | P a g e Computer Architecture and Organization
WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and
Principles of Cache Design
Multilevel Caches. As logic density has increased, it has become possible to have a
cache on the same chip as the processor: the on-chip cache. Compared with a cache
reachable via an external bus, the on-chip cache reduces the processor’s external bus
activity and therefore speeds up execution times and increases overall system
performance. When the requested instruction or data is found in the on-chip cache, the
bus access is eliminated. Because of the short data paths internal to the processor,
compared with bus lengths, on-chip cache accesses will complete appreciably faster
than would even zero-wait state bus cycles. Furthermore, during this period the bus is
free to support other transfers.
The inclusion of an on-chip cache leaves open the question of whether an off-chip, or
external, cache is still desirable. Typically, the answer is yes, and most contemporary
designs include both on-chip and external caches. The simplest such organization is
known as a two-level cache, with the internal cache designated as level 1 (L1) and the
external cache designated as level 2 (L2). The reason for including an L2 cache is the
following: If there is no L2 cache and the processor makes an access request for a
memory location not in the L1 cache, then the processor must access DRAM or ROM
memory across the bus. Due to the typically slow bus speed and slow memory access
time, this results in poor performance. On the other hand, if an L2 SRAM (static RAM)
cache is used, then frequently the missing information can be quickly retrieved. If the
SRAM is fast enough to match the bus speed, then the data can be accessed using a
zero-wait state transaction, the fastest type of bus transfer.
REFERENCES:
https://www.i-programmer.info/babbages-bag/375-cache-memory.html
https://www.informationq.com/computer-memory-overview/
34 | P a g e Computer Architecture and Organization
WEEK 6:Computer Memory System Overview, Cache Memory, Design Elements and
Principles of Cache Design
https://quickcse.wordpress.com/2018/08/12/elements-of-cache-design/