0% found this document useful (0 votes)
41 views

Module 5: Emerging NVM: Advanced Topics in Modern VISL and Architecture

The document summarizes a presentation on emerging non-volatile memory (NVM) technologies. It discusses the evolution of NVM, including magnetic hard disk drives (HDDs) and flash memory. It then focuses on magnetic random-access memory (MRAM) as a promising NVM technology. The presentation describes the basic structure and operation of MRAM cells. It compares the performance and power characteristics of SRAM and MRAM, finding that MRAM has lower leakage power but higher write latency and energy. Two techniques are proposed to improve the performance and power of caches using MRAM: a read-preemptive write buffer and a hybrid SRAM-MRAM cache architecture. Simulation results show these techniques eliminate performance degradation and

Uploaded by

Wang David
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Module 5: Emerging NVM: Advanced Topics in Modern VISL and Architecture

The document summarizes a presentation on emerging non-volatile memory (NVM) technologies. It discusses the evolution of NVM, including magnetic hard disk drives (HDDs) and flash memory. It then focuses on magnetic random-access memory (MRAM) as a promising NVM technology. The presentation describes the basic structure and operation of MRAM cells. It compares the performance and power characteristics of SRAM and MRAM, finding that MRAM has lower leakage power but higher write latency and energy. Two techniques are proposed to improve the performance and power of caches using MRAM: a read-preemptive write buffer and a hybrid SRAM-MRAM cache architecture. Simulation results show these techniques eliminate performance degradation and

Uploaded by

Wang David
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Dragon Star Summer School

Advanced Topics in Modern VISL and Architecture

Module 5: Emerging NVM


Yuan Xie
Associate Professor The Pennsylvania State University Department of Computer Science & Engineering
www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu
(Some slides are adapted from Dr. Yiran Chen and Dr. Helen Li from Seagate)

What is Nonvolatile Memory?

Non-volatile memory, nonvolatile memory, NVM or non-volatile storage, is computer memory that can retain the stored information even when not powered. -www.wikipedia.org

Evolution of Nonvolatile Memory


Punched card Flash Memory ROM

Paper tape

HDD

Floppy Disk

CD-R

1864

1900s 1956 1970s

1971

1980

1988

Success Story of Magnetism HDD


1956: First HDD: RAMAC 305 (IBM). 5MB of data at $50,000. As big as two refrigerators. Uses 50 24 platters. 1973: First modern "Winchester" HDD (IBM): Model 3340. 1979: First 5.25 HDD for PC (Shugart Tech., now Seagate Tech.). 1982: First drive with more than 1GB of storage: 1.2GB H-8598, with 50kg (Hitachi). 1983: First 3.5 HDD: RO352 (10MB, Rodime). 1988: First 2.5 HDD: 220 (20MB, Prairie Tek). 1997: Magneto resistive (GMR) heads (IBM). 2000: First 15,000-rpm HDD: Cheetah X15 (Seagate). 2002 100Gbits per square inch (Seagate). 2006: First 2.5-inch model to use perpendicular magnetic recording, boosts capacity up to 160GB. (Seagate) 2006: 1 12G HDD (Seagate) 2007: First real 1TB hard disk drive (Seagate).
4

Price Trend of HDD and SSD (NAND)


$1/GB challenges (2009) enterprise market $0.2/GB challenges (2011) laptop market
Enterprise Market Laptop Market

NAND Cost Reduction Is Challenge After 2011 For Technology Barrier With NAND

$99 64GB SSD deal showed up 2 month ago!


5

Next Generation Nonvolatile Memory

*Source: IBM
Page 6

Nonvolatile Memory Candidates


SRAM Data Retention Memory Cell Factor Read Time (ns) Write /Erase time (ns) Number of rewrites Power Consumption Read/Write Power consumption other than R/W N 50-120 1 1 1016 Low DRAM N 6-10 30 50 1016 Low NOR Y 10 10 105-107 105 High NAND Y 2-5 50 106-105 105 High PRAM Y 6-12 20-50 50-120 1010 Low MRAM Y 4-20 2-20 2-20 1015 Low

Leakage Current

Refresh Power

None

None

None

None

R-RAM outperforms NAND in cost (< x1/4), density (> x2) and performance STT-RAM is ideal for embedded solution Competition: IBM = PCM, SEC = PCM/NiO, Toshiba 3D NAND, SEAGATE = MRAM EVERSPIN=MRAM *Source: ITRS
Page 7

MRAM Cells

The structure of one transistor and one Magnetic Tunnel Junction (MTJ).
High resistance :: Low resistance 1 Free layer 0

Reference layer

3D View
Bit Line

MM4 BTLDM VIM3


Bit Line

TPLDM Sensor stack ( X-bit )

M3
ine Source L

Drain M2
e Source Lin

M1

Drain
Ga te=

Wo rd

Lin S e G

Read Operation of STT-RAM

Rmin 0.005 0.004

Rmax

Rref

P ro b a b ility

0.003 0.002 0.001 0 500 1500 2500

Resistance (Ohm)

10

MRAM technologies are being improved and attracting more attention.

11

11

SRAM vs. MRAM


High Density Fast Read Slow Write Low Read Energy High Write Energy Area (65nm) Capacity Read latency Write latency Read energy Write energy 3.66mm2 SRAM 128KB 2.25ns 2.26ns 0.90nJ 0.80nJ 3.30mm2 MRAM 512KB 2.32ns 11.02ns 0.86nJ 5.00nJ Leakage power 2.09W 0.26W

Cache configurations Low Leakage 2MB (16x128KB) SRAM cache 8MB (16x512KB) MRAM cache

Pros: Low leakage power, high density. Cons: Long write latency and large write energy. Replace SRAM caches with MRAM ? (HPCA 2009) 12 12

The Baseline 3D Architecture


Core layer + cache layers. NUCA caches with NOC connections.


Data Migration Vertical Hop Cache bank R TSV Cache bank R Cache bank R Horizontal Hop R

Cache Bank Router Layer2

Cache bank

TSV Core Cache Controller Layer1 (Li et al, ISCA 06)

13

13

The Baseline 3D Architecture

14

14

Direct Replacement

Replace SRAM with MRAM of same area. The number of banks are kept the same. The capacity of L2 cache increases by three times.

L2 Cache Miss Rate

L2 cache miss rate reduced. How is the performance?


15

15

IPC Comparison (Direct Replacement)

IPC (SRAM vs. MRAM) The last four benchmarks have high write intensities. (see Observation 1)
16

16

Observation 1 (Direct Replacement)

Replacing SRAM L2 caches directly with MRAM can reduce the access miss rate of L2 caches. However, the long access latency to MRAM cache has a negative impact on the performance. When the write intensity is high, it even results in performance degradation.

Direct MRAM replacement may harm performance, How is power consumption?


17

17

Power Analysis

(Direct Replacement)

(Normalized to 2M-SRAM-SNUCA)

MRAM dynamic power MRAM leakage power

Total Power (SRAM vs. MRAM)

For some workloads, MRAM dynamic power dominates! (see Observation 2)


18

18

Observation 2

Replacing SRAM L2 caches directly with MRAM can greatly reduce the leakage power. When the write intensity is high, the dynamic power increases significantly because of the high write energy of MRAM cache. Question: How to improve the performance and further reduce power of MRAM?

19

19

T1: Read-Preemptive Write Buffer


(Demo) Write Buffer(FIFO) Write just begins is almost request done Write Op. Read Op. Read Data Read request MRAM Caches

How can read request evict write Read Op. request (preemptive condition)?
Cores Read Data

20

16

T2: SRAM-MRAM Hybrid L2 Cache

The read-preemptive write buffer hides the MRAM long write latency. We propose SRAM-MRAM Hybrid Cache to reduce write intensities to MRAM.

21

17

T2: SRAM-MRAM Hybrid L2 Cache


(Hybrid Structure)

31 way MRAM caches + one way SRAM cache. 32

MRAM bank

TSV

Core SRAM bank


22

18

T2: SRAM-MRAM Hybrid L2 Cache


(Reduce Write Operations)

Migrate data migrations among MRAM cache banks. Reduce data frequently written to the SRAM cache banks.

Home region No migrations

Migration from MRAM to SRAM

23

19

T2: SRAM-MRAM Hybrid L2 Cache


(Write Intensity: Pure vs. Hybrid)

Write Intensity (Pure vs. Hybrid)

Using hybrid L2 cache, MRAM write intensities are reduced


24

20

Combine T1 and T2 (IPC Result)


direct replacement with read-preemptive

IPC Comparison After adopting T1 and T2, the performance degradation is eliminated. The average IPC is increased by 15%.
25

21

Combine T1 and T2 (Power Result)


8M-MRAM-DNUCA direct replacement with read-preemptive

Total Power Comparison After adopting T1 and T2, the dynamic power is reduced. The average total power is further reduced by 17%.
26

22

Compare DRAM & MRAM caches


Cache size Area Read Latency Write Latency Read Energy Write Energy 512KB DRAM 2.38 mm2 4.966ns 4.966ns 0.689nJ 0.689nJ 512KB MRAM 3.30 mm2 2.318ns 11.024ns 0.858nJ 4.997nJ Leakage Power 1.6W 0.255W

Cache configurations 8MB 16 X 512KB DRAM cache 8MB 16 X 512KB MRAM cache

We can get better performance with MRAM caches.


27

27

Compare power of DRAM and MRAM caches

MRAM has lower power.


28

28

Hybrid Cache Architecture with Disparate Memory Technologies (ISCA 2009)

29

Different memory technologies


DRAM 1T1C structure

SRAM 6T structure

MTJ

Magnetic RAM 1T1J structure

Phase Change RAM 1T1J structure 30

31

Different memory technologies

MTJ

SRAM 6T structure

Magnetic RAM 1T1J structure

Phase Change RAM 1T1J structure

32

Comparisons
Density High (4) High(16) (ratio) (1) Dynamic Low Low for read; Medium for High for read; High Power Leakage High Low Low write for write Fast outperform Slow for Speed Fast Hybrid Cache could for Power read; read; Non-volatility No Yes Yes Slow for technology Very slow its counterpart of single Scalability Yes Yes write forYes write >1015 1016 1012 Endurance
PRAM assumes four bits per cell

SRA Low M

MRAM

PRAM

Reducedynamicmiss rate High leakage power Increase hit latency Low Cache power
33

Read/Write

Reads and writes Reads and writes have different performance/power implications Varied read/write behaviors for different benchmarks Emerging memories have different read/write features

Read-write aware Hybrid Cache Architecture (RWHCA) using NVM


34

RWHCA

Read-write aware Hybrid Cache Architecture (RWHCA) using Emergin NVM: Made of different memory technologies and distinguish reads and writes Increase effective cache size under similar area Reduce leakage power consumption Read/write exclusive regions in the same cache level Write region has faster write and low write power (SRAM) Intra-cache data movement policies Placing frequently written data to the write region Reduce power, may improve performance
35

Methodology
Chiplet Core w/ L1s L2 Write (SRAM) Core w/ L1s L2 Write (SRAM) L2 (SRAM) L2 Read (MRAM/ PRAM) L2 Read (MRAM)

Core w/ L1s

L3 (PRAM)

Baseline

RWHCA

3DRWHCA

36

Methodology

Cache parameters: CACTI or modified versions SRAM: 1MB, 8 cycles, 0.388 nJ, 1.36 W (45nm) MRAM: 4MB, 20/60 cycles, 0.4/2.3 nJ, 0.15W PRAM: 16MB, 40/200 cycles, 0.8/1.5 nJ, 0.3W System configuration Simulator: IBM Mambo Processor: 8-way issue, out-of-order, 4GHz L1: 32KB DL1,32KB IL1, 128B, 4-way, 1 r/w port, 2 cycles L2/L3: different for design cases Workloads 30 workloads from SPECINT2006, SPECJBB, NAS, BioPerf, PARSEC, SPLASH2 Various cache size requirements

37

RWHCA-result
SRAM/MRAM RWHCA L2 performance

5% geometric mean performance improvement over baseline 3% improvement over previous DNUCA policy DNUCA: move a line to a closer bank on each hit, no difference for reads and writes, other policies Also achieve better performance than 3-level SRAM cache 256KB L2 and 1MB L3, similar area

1.66

1.94

38

RWHCA-result
SRAM/MRAM RWHCA L2 power

55% power reduction over baseline dynamic power: normal + swap, less leakage power Lower power than DNUCA and 3-level SRAM

4 bars: SRAM baseline, DNUCA, RWHCA, 3-level SRAM

39

RWHCA-result
SRAM/PRAM RWHCA L2 performance

20% performance degradation over baseline PRAM is not suitable for L2 cache from the performance perspective due to its long write latency Low endurance, not suitable for lower level cache
1.42 1.44

40

Outline
Introduction Methodology Read-write

and Motivation

aware Hybrid Cache Architecture 3D Hybrid Cache stacking Conclusions


41

3DRWHCA-configuration

SRAM/MRAM/PRAM 3DRWHCA SRAM + MRAM L2 Total size: 4MB, 256KB SRAM Write region: SRAM, region region: MRAM SRAM r/w: 6 cycles, MRAM r: 20 cycles, w: 60 cycles Bank number: 16, Associativity: 16 Block size: 128B, 1 r/w port

L3 PRAM 32MB (core + L1 has similar area with L2) L3 bank number: 64, Associativity: 64 Block size: 128B, 1 r/w port Power: scale from RWHCA

42

3DRWHCA-result
3DRWHCA performance

16% geometric mean performance improvement over baseline 11% improvement over SRAM/MRAM RWHCA

1.94

2.2

1.88

1.71

43

3DRWHCA-result
3DRWHCA power

10% power reduction over baseline even with a PRAM L3 Higher power than RWHCA Lower power than 3-level SRAM

4 bars: SRAM baseline, RWHCA, 3DRWHCA, 3-level SRAM

44

Conclusion

Emerging NVM is getting mature Will this bring a new impact on computer architecture and system design?

45

You might also like