100% found this document useful (1 vote)
73 views34 pages

Cache Memory: A Safe Place For Hiding or Storing Things

Cache memory is a small, fast memory located close to the processor that stores copies of frequently used data from main memory. It improves performance by providing faster access to these frequently used data compared to accessing main memory. The key parameters of cache memory are cache hit ratio, cache miss penalty, cache mapping techniques like direct mapping, set associative mapping and fully associative mapping, and cache update policies like write-through, write-back and write-around. Cache memory exploits the principle of locality, where programs tend to reuse data and instructions recently accessed.

Uploaded by

Dev Rishi Thakur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
73 views34 pages

Cache Memory: A Safe Place For Hiding or Storing Things

Cache memory is a small, fast memory located close to the processor that stores copies of frequently used data from main memory. It improves performance by providing faster access to these frequently used data compared to accessing main memory. The key parameters of cache memory are cache hit ratio, cache miss penalty, cache mapping techniques like direct mapping, set associative mapping and fully associative mapping, and cache update policies like write-through, write-back and write-around. Cache memory exploits the principle of locality, where programs tend to reuse data and instructions recently accessed.

Uploaded by

Dev Rishi Thakur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 34

Cache Memory

A safe place for hiding or storing


things
Introduction to Cache memory
● Principle of locality – programs tend to
reuse the data and instructions they have
used recently.
● An implication of locality is that we can
predict with reasonable accuracy what
instructions and data a program will use in
the near future based on its accesses in the
recent past.
● The principle of locality also applies to data
accesses, though not as strongly as to code
accesses.
Types of Locality

● Types of locality

○ Temporal locality

■ recently accessed items are likely to be accessed in the near future.

○ Spatial locality

■ items whose addresses are near one another tend to be referenced close
together in time.
Parameters of Cache memory

● Cache Hit
○ A referenced item is found in the cache by the
processor
● Cache Miss
○ A referenced item is not present in the cache
● Hit ratio
○ Ratio of number of hits to total number of
references => number of hits/(number of hits +
number of Miss)
● Miss penalty
○ Additional cycles required to serve the miss
Parameters of Cache Memory

● Time required for the cache miss depends on both the latency and
bandwidth
● Latency – time to retrieve the first word of the block
● Bandwidth – time to retrieve the rest of this block
Start

Receive Address from CPU

Is No Access Main Memory for


Block Containing Item the block containing the item
in the cache

Yes
Select the cache line to receive
the block from Main Memory
Deliver Block To CPU

Load main Memory Deliver block


Block into cache To CPU

Done
Direct Mapping

Block Placement Set Associative

Fully Associative

Tag

Block Identification Index

Offset
Cache Memory Management
Techniques
FCFS

Block Replacement LRU

Random

Write Through

Write back
Update Policies
Write around

Write allocate
Example

V.Saritha@VIT University, SCSE


Direct Mapping
15
14
14
13
7
12
6
11
5
10
4
9
3
8
2
7
1
6
0
5
4
3 Cache
2
1 (MM Block address) mod (Number of lines in a cache)
0
(12) mod (8) =4
Main Memory
Set Associative Mapping
15
14
14
13
7
12 3
6
11
5
10 2
4
9
3
8 1
2
7
1
6 0
0
5
4
3 Cache
2
1 (MM Block address) mod (Number of sets in a cache)
0
(12) mod (4) =0
Main Memory
Fully Associative Mapping
15
14
14
13
7
12
6
11
5
10
4
9
3
8
2
7
1
6
0
5
4
3 Cache
2
1 Random
0

Main Memory
Associative Mapping
● Fastest, most flexible but very expensive
● Any block location in cache can store any block
in memory
● Stores both the address and the content of the
memory word
● CPU address of 15 bits is placed in the argument
register and the associative memory is searched
for a matching address
● If found data is read and sent to the CPU else
main memory is accessed.
● CAM – content addressable memory

V.Saritha@VIT University, SCSE


Direct Mapping
● N-bit CPU memory address – k bits index field
and (n-k) bits tag field
● Index bits are used to access the cache
● Each word in cache consists of data word and its
associated tag

On memory request, index field is used to access


the cache. Tag field of CPU address is compared
with the tag in the word read in the cache
If match then there is hit
Else miss – word is read from the memory
It is then stored in the cache together with
the new tag replacing the previous value
Disadvantage:
● hit ratio drops if 2 or more words with
same index but different tags are
accessed repeatedly.
When memory is divided into blocks of words,
index field – block field and word field
Ex: 512 words cache – 64 blocks of 8words each –
block field (6bits) and words field (3bits)
● Tags within the block are same.

● When a miss occurs, entire block is transferred from main


memory to cache.
● It is time consuming but improves hit ratio because of the
sequential nature of programs.
Set-Associative Mapping
● Each word of cache can store 2 or more words of memory under
the same index address

The comparison logic is done by an associative search of the tags in


the set similar to an associative memory search, thus the name
“Set-Associative”
● Hit ratio increases as the set size increases but more complex
comparison logic is required when number of bits in words of cache
increases
● When a miss occurs and set is full, one of tag-data items are
replaced using block replacement policy
Problems
● A set associative cache consists of 64
lines or slots, divided into four line sets.
Main memory consists 4k blocks of 128
words each. Show the format of main
memory addresses.

V.Saritha@VIT University, SCSE


Term Normal Usage Usage of Power
Problem 2 M 106 220 = 1,048,576
(Mega)

● A two-way set associative cache has lines of 16 bytes and a total


size of 8k bytes. The 64-Mbyte main memory is byte addressable.

Show the format of main memory addresses.


Block Replacement

● Least Recently Used: (LRU)


Replace that block in the set that has
been in the cache longest with no
reference to it.
● First Come First Out: (FIFO)
Replace that block in the set that has
been in the cache longest.
● Least Frequently Used: (LFU)
Replace that block in the set that has
experienced the fewest references
Update Policies - Write Through

● Update main memory with every memory


write operation
● Cache memory is updated in parallel if it
contains the word at specified address.
● Advantage: main memory always contains
the same data as the cache
● It is important during DMA transfers to
ensure the data in main memory is valid
● Disadvantage: slow due to memory access
time
Write Back

● Only cache is updated during write operation and marked by flag.


When the word is removed from the cache, it is copied into main
memory
● Memory is not up-to-date, i.e., the same item in cache and memory
may have different value
Update policies

● Write-Around

○ correspond to items not currently in the cache (i.e. write misses) the item
could be updated in main memory only without affecting the cache.
● Write-Allocate

○ update the item in main memory and bring the block containing the updated
item into the cache.
Performance analysis

● Look through: The cache is checked first for a hit, and if a miss
occurs then the access to main memory is started.
● Look aside: access to main memory in parallel with the cache
lookup;
•Look through
TA = TC + (1-h)*TM
TC is the average cache access time
TM is the average access time
(Mean memory access time)

•Look aside
TA = h*TC + (1-h)*TM

number of references found in the cache


•hit ratio h =
total number of memory references

• Miss Ratio m=(1-h)


Example: assume that a computer system employs
a cache with an access time of 20ns and a main
memory with a cycle time of 200ns. Suppose that
the hit ratio for reads is 90%,
a) what would be the average access time
for reads if the cache is a “look-through”
cache?
The average read access time (TA) = TC + (1-h)*TM
20ns + 0.10*200ns = 40ns
b) what would be the average access time
for reads if the cache is a “look-Aside”
cache?
The average read access time in this case (TA)
= h*TC + (1-h)*TM= 0.9*20ns + 0.10*200ns = 38ns
Problem

● Consider a memory system with Tc = 100ns and


Tm = 1200ns. If the effective access time is 10%
greater than the cache access time, what is the
hit ratio H in look-through cache?
Þ TA = TC + (1-h)*TM
Þ 1.1 Tc = Tc + (1-h)*TM
Þ 0.1TC = (1-h)* TM
Þ 0.1 * 100 = (1-h) *1200
Þ 1-h = 10/1200
Þ h = 1190/1200
Sources of cache Misses

● Compulsory Misses
● Capacity Misses
● Cold Misses
● Conflict Misses
Sources of Cache Misses

○ Compulsory Misses: These are misses that are caused by the cache being empty initially.

○ Cold Misses : The very first access to a block will result in a miss because the block is not brought into cache
until it is referenced.

○ Capacity Misses : If the cache cannot contain all the blocks needed during the execution of a program, capacity
misses will occur due to blocks being discarded and later retrieved.

○ Conflict Misses: If the cache mapping is such that multiple blocks are mapped to the same cache entry
Cache organization

● Split cache
○ Separate caches for instructions and data
○ I-cache (Instruction) – mostly accessed sequentially
○ D-cache (data) – mostly random access
● Unified cache
○ Same cache for instruction and data
● Higher hit rate for unified cache as it balances
between instruction and data
● Split caches eliminate contention for cache
between the instruction processor and the
execution unit – used for pipelining processes
Multilevel caches
● The penalty for a cache miss is the extra time
that it takes to obtain the requested item from
central memory.
● One way in which this penalty can be reduced is
to provide another cache, the secondary cache,
which is accessed in response to a miss in the
primary cache.
● The primary cache is referred to as the L1 (level
1) cache and the secondary cache is called the
L2 (level 2) cache.
● Most high-performance microprocessors include
an L2 cache which is often located off-chip,
whereas the L1 cache is located on the same
chip as the CPU.
● With a two-level cache, central memory has to
be accessed only if a miss occurs in both caches.
Example:
● A computer system employs a write-back cache with a
70% hit ratio for writes. The cache operates in look-aside
mode and has a 90% read hit ratio. Reads account for 80%
of all memory references and writes account for 20%. If
the main memory cycle time is 200ns and the cache
access time is 20ns, what would be the average access
time for all references (reads as well as writes)?
The average access time for reads = 0.9*20ns + 0.1*200ns =
38ns.
The average write time = 0.7*20ns + 0.3*200ns = 74ns
Hence the overall average access time for combined reads
and writes is
0.8*38ns + 0.2*74ns = 45.2ns
Total Reference
100%

80% Reads 20 %Writes

10% Miss 30 Miss


90% hit 70 hit

The average access time for reads


= 0.9*20ns + 0.1*200ns = 38ns.
The average write time
= 0.7*20ns + 0.3*200ns = 74ns
Hence the overall average access time for
combined reads and writes is
=0.8*38ns + 0.2*74ns = 45.2ns
References

● J. L. Hennessy & D.A. Patterson, Computer architecture: A


quantitative approach, Fourth Edition, Morgan Kaufman, 2004.

You might also like