Multicore Computer Architecture - Storage and Interconnects
Lecture 5
Block Replacement Techniques & Write Strategy
Dr. John Jose
Assistant Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati, Assam.
Processor Memory Performance Gap
Memory Hierarchy
Four cache memory design choices
Where can a block be placed in the cache?
– Block Placement
How is a block found if it is in cache memory?
– Block Identification
Which block should be replaced on a miss?
– Block Replacement
What happens on a write?
– Write Strategy
Block Replacement
Cache has finite size. What do we do when it is full?
Direct Mapped is Easy
Which block in the selected set of a set associative cache?
Block Replacement Algorithms
Random
First In First Out (FIFO)
Least Recently Used, pseudo-LRU
Last In First Out (LIFO)
Not Recently Used (NRU)
Least Frequently Used (LFU)
Re-Reference Interval Predication (RRIP)
Optimal
Random Replacement Policy
Random policy needs a pseudo-random number generator
Overhead is an O(1) amount of work per block replacement
Makes no attempt to take advantage of any temporal or spatial
localities
FIFO Replacement Policy
First-in, First-out(FIFO) policy evict the block that has been in
the cache the longest
It requires a queue Q to store references
Blocks are enqueued in Q, dequeue operation on Q to determine
which block to evict.
Overhead is an O(1) amount of work per block replacement
Optimal Replacement Policy
Evict block with longest reuse distance
i.e. next reference to block is farthest in future
Requires knowledge of the future!
Can’t build it, but can model it with trace
Useful, since it reveals opportunity
Optimal better than LRU
(X,A,B,C,D,X): LRU 4-way SA cache, 2nd X will miss
Least-Recently Used Policy
For associativity =2, LRU is equivalent to NMRU
Single bit per line indicates LRU/MRU
Set/clear on each access
For a>2, LRU is difficult/expensive
Timestamps? How many bits?
Must find min timestamp on each eviction
Sorted list? Re-sort on every access?
List overhead: log2(a) bits /block
Shift register implementation
Random vs FIFO vs LRU
New block Old block (chosen at random)
Random policy:
New block Old block(present longest)
FIFO policy:
Insert time: 8:00 am 7:48am 9:05am 7:10am 7:30 am 10:10am 8:45am
New block Old block(least recently used)
LRU policy:
last used: 7:25am 8:12am 9:22am 6:50am 8:20am 10:02am 9:50am
LRU Implementation
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Hit in CL 0 Hit in CL 4 Hit in CL 7 Miss: replace CL 6
4 LRU 4 LRU 6 LRU 6 LRU 3 LRU
6 6 3 3 1
3 3 1 1 5
1 1 7 5 2
0 7 5 2 0
7 5 2 0 4
5 2 0 4 7
2 MRU 0 MRU 4 MRU 7 MRU 6 MRU
Practical Pseudo-LRU
J
Older 0 F
1 C
1 B
0 X
1 Y
Newer 0 A
1 Z
Rather than true LRU, use binary tree
Each node records which half is older/newer
Update nodes on each reference
Follow older pointers to find LRU victim
Practical Pseudo-LRU
J J Y X Z BC F A
F
C 011: PLRU Block B
B is here
X
Y 110: MRU block
A is here
Z
Partial Order Encoded in Tree:
B C F A
Z<A Y<X B<C J<F
J
A>X C<F Y X
A>F Z
Practical Pseudo-LRU
J Refs: J,Y,X,Z,B,C,F,A
Older 0 F
1 C
011: PLRU Block B
1 B is here
0 X
1 Y
110: MRU block
Newer 0 A is here
1 Z
Binary tree encodes PLRU partial order
At each level point to LRU half of subtree
Each access: flip nodes along path to block
Eviction: follow LRU path
Overhead: (a-1)/a bits per block
Not Recently Used (NRU)
Keep NRU state in 1 bit/block
Bit is reset to 0 when installed / re referenced
Bit is set to 1 when it is not referenced and other block in the
same set is referenced
Evictions favor NRU=1 blocks
If all blocks are NRU=0 / 1 then pick by random
Provides some scan and thrash resistance
Randomizing evictions rather than strict LRU order
Re-reference Interval Prediction
RRIP
Extends NRU to multiple bits
Start in the middle
promote on hit
demote over time
Can predict near-immediate, intermediate, and distant re-
reference
Least Frequently Used
Counter per block, incremented on reference
Evictions choose lowest count
Logic not trivial (a2 comparison/sort)
Storage overhead
1 bit per block: same as NRU
How many bits are helpful?
Write strategy
Write through: The information is written to both the block in
the cache and to the block in the next level memory
Write Through: read misses do not need to write back
evicted line contents
Write back: The information is written only to the block in the
cache. The modified cache block is written to main memory
only when it is replaced.
is block clean or dirty?
Write Back: no writes of repeated writes
What About Write Miss?
Write allocate: The block is loaded into cache on a write miss
No-Write allocate: The block is modified in the memory but not
in cache
Types of Cache Misses
Compulsory
Very first access to a block
Will occur even in an infinite cache
Capacity
If cache cannot contain all the blocks needed
Misses in fully associative cache (due to the capacity)
Conflict
If too many blocks map to the same set
Occurs in associative or direct mapped cache
johnjose@[Link]
[Link]