0% found this document useful (0 votes)

41 views

Module 5: Emerging NVM: Advanced Topics in Modern VISL and Architecture

The document summarizes a presentation on emerging non-volatile memory (NVM) technologies. It discusses the evolution of NVM, including magnetic hard disk drives (HDDs) and flash memory. It then focuses on magnetic random-access memory (MRAM) as a promising NVM technology. The presentation describes the basic structure and operation of MRAM cells. It compares the performance and power characteristics of SRAM and MRAM, finding that MRAM has lower leakage power but higher write latency and energy. Two techniques are proposed to improve the performance and power of caches using MRAM: a read-preemptive write buffer and a hybrid SRAM-MRAM cache architecture. Simulation results show these techniques eliminate performance degradation and

Uploaded by

Wang David

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Module 5: Emerging NVM: Advanced Topics in Modern VISL and Architecture

Uploaded by

Wang David

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Dragon Star Summer School

Advanced Topics in Modern VISL and Architecture

Module 5: Emerging NVM

Yuan Xie
Associate Professor The Pennsylvania State University Department of Computer Science & Engineering
www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu
(Some slides are adapted from Dr. Yiran Chen and Dr. Helen Li from Seagate)

What is Nonvolatile Memory?

Non-volatile memory, nonvolatile memory, NVM or non-volatile storage, is computer memory that can retain the stored information even when not powered. -www.wikipedia.org

Evolution of Nonvolatile Memory

Punched card Flash Memory ROM

Paper tape

HDD

Floppy Disk

CD-R

1864

1900s 1956 1970s

1971

1980

1988

Success Story of Magnetism HDD

1956: First HDD: RAMAC 305 (IBM). 5MB of data at $50,000. As big as two refrigerators. Uses 50 24 platters. 1973: First modern "Winchester" HDD (IBM): Model 3340. 1979: First 5.25 HDD for PC (Shugart Tech., now Seagate Tech.). 1982: First drive with more than 1GB of storage: 1.2GB H-8598, with 50kg (Hitachi). 1983: First 3.5 HDD: RO352 (10MB, Rodime). 1988: First 2.5 HDD: 220 (20MB, Prairie Tek). 1997: Magneto resistive (GMR) heads (IBM). 2000: First 15,000-rpm HDD: Cheetah X15 (Seagate). 2002 100Gbits per square inch (Seagate). 2006: First 2.5-inch model to use perpendicular magnetic recording, boosts capacity up to 160GB. (Seagate) 2006: 1 12G HDD (Seagate) 2007: First real 1TB hard disk drive (Seagate).
4

Price Trend of HDD and SSD (NAND)

$1/GB challenges (2009) enterprise market $0.2/GB challenges (2011) laptop market
Enterprise Market Laptop Market

NAND Cost Reduction Is Challenge After 2011 For Technology Barrier With NAND

$99 64GB SSD deal showed up 2 month ago!

Next Generation Nonvolatile Memory

*Source: IBM
Page 6

Nonvolatile Memory Candidates

SRAM Data Retention Memory Cell Factor Read Time (ns) Write /Erase time (ns) Number of rewrites Power Consumption Read/Write Power consumption other than R/W N 50-120 1 1 1016 Low DRAM N 6-10 30 50 1016 Low NOR Y 10 10 105-107 105 High NAND Y 2-5 50 106-105 105 High PRAM Y 6-12 20-50 50-120 1010 Low MRAM Y 4-20 2-20 2-20 1015 Low

Leakage Current

Refresh Power

None

R-RAM outperforms NAND in cost (< x1/4), density (> x2) and performance STT-RAM is ideal for embedded solution Competition: IBM = PCM, SEC = PCM/NiO, Toshiba 3D NAND, SEAGATE = MRAM EVERSPIN=MRAM *Source: ITRS
Page 7

MRAM Cells

The structure of one transistor and one Magnetic Tunnel Junction (MTJ).
High resistance :: Low resistance 1 Free layer 0

Reference layer

3D View
Bit Line

MM4 BTLDM VIM3

Bit Line

TPLDM Sensor stack ( X-bit )

M3
ine Source L

Drain M2
e Source Lin

Drain
Ga te=

Wo rd

Lin S e G

Read Operation of STT-RAM

Rmin 0.005 0.004

Rmax

Rref

P ro b a b ility

0.003 0.002 0.001 0 500 1500 2500

Resistance (Ohm)

MRAM technologies are being improved and attracting more attention.

SRAM vs. MRAM

High Density Fast Read Slow Write Low Read Energy High Write Energy Area (65nm) Capacity Read latency Write latency Read energy Write energy 3.66mm2 SRAM 128KB 2.25ns 2.26ns 0.90nJ 0.80nJ 3.30mm2 MRAM 512KB 2.32ns 11.02ns 0.86nJ 5.00nJ Leakage power 2.09W 0.26W

Cache configurations Low Leakage 2MB (16x128KB) SRAM cache 8MB (16x512KB) MRAM cache

Pros: Low leakage power, high density. Cons: Long write latency and large write energy. Replace SRAM caches with MRAM ? (HPCA 2009) 12 12

The Baseline 3D Architecture

Core layer + cache layers. NUCA caches with NOC connections.

Data Migration Vertical Hop Cache bank R TSV Cache bank R Cache bank R Horizontal Hop R

Cache Bank Router Layer2

Cache bank

TSV Core Cache Controller Layer1 (Li et al, ISCA 06)

The Baseline 3D Architecture

Direct Replacement

Replace SRAM with MRAM of same area. The number of banks are kept the same. The capacity of L2 cache increases by three times.

L2 Cache Miss Rate

L2 cache miss rate reduced. How is the performance?

IPC Comparison (Direct Replacement)

IPC (SRAM vs. MRAM) The last four benchmarks have high write intensities. (see Observation 1)
16

Observation 1 (Direct Replacement)

Replacing SRAM L2 caches directly with MRAM can reduce the access miss rate of L2 caches. However, the long access latency to MRAM cache has a negative impact on the performance. When the write intensity is high, it even results in performance degradation.

Direct MRAM replacement may harm performance, How is power consumption?

Power Analysis

(Direct Replacement)

(Normalized to 2M-SRAM-SNUCA)

MRAM dynamic power MRAM leakage power

Total Power (SRAM vs. MRAM)

For some workloads, MRAM dynamic power dominates! (see Observation 2)

Observation 2

Replacing SRAM L2 caches directly with MRAM can greatly reduce the leakage power. When the write intensity is high, the dynamic power increases significantly because of the high write energy of MRAM cache. Question: How to improve the performance and further reduce power of MRAM?

T1: Read-Preemptive Write Buffer

(Demo) Write Buffer(FIFO) Write just begins is almost request done Write Op. Read Op. Read Data Read request MRAM Caches

How can read request evict write Read Op. request (preemptive condition)?
Cores Read Data

T2: SRAM-MRAM Hybrid L2 Cache

The read-preemptive write buffer hides the MRAM long write latency. We propose SRAM-MRAM Hybrid Cache to reduce write intensities to MRAM.

T2: SRAM-MRAM Hybrid L2 Cache

(Hybrid Structure)

31 way MRAM caches + one way SRAM cache. 32

MRAM bank

TSV

Core SRAM bank

T2: SRAM-MRAM Hybrid L2 Cache

(Reduce Write Operations)

Migrate data migrations among MRAM cache banks. Reduce data frequently written to the SRAM cache banks.

Home region No migrations

Migration from MRAM to SRAM

T2: SRAM-MRAM Hybrid L2 Cache

(Write Intensity: Pure vs. Hybrid)

Write Intensity (Pure vs. Hybrid)

Using hybrid L2 cache, MRAM write intensities are reduced

Combine T1 and T2 (IPC Result)

direct replacement with read-preemptive

IPC Comparison After adopting T1 and T2, the performance degradation is eliminated. The average IPC is increased by 15%.
25

Combine T1 and T2 (Power Result)

8M-MRAM-DNUCA direct replacement with read-preemptive

Total Power Comparison After adopting T1 and T2, the dynamic power is reduced. The average total power is further reduced by 17%.
26

Compare DRAM & MRAM caches

Cache size Area Read Latency Write Latency Read Energy Write Energy 512KB DRAM 2.38 mm2 4.966ns 4.966ns 0.689nJ 0.689nJ 512KB MRAM 3.30 mm2 2.318ns 11.024ns 0.858nJ 4.997nJ Leakage Power 1.6W 0.255W

Cache configurations 8MB 16 X 512KB DRAM cache 8MB 16 X 512KB MRAM cache

We can get better performance with MRAM caches.

Compare power of DRAM and MRAM caches

MRAM has lower power.

Hybrid Cache Architecture with Disparate Memory Technologies (ISCA 2009)

Different memory technologies

DRAM 1T1C structure

SRAM 6T structure

MTJ

Magnetic RAM 1T1J structure

Phase Change RAM 1T1J structure 30

Different memory technologies

MTJ

SRAM 6T structure

Magnetic RAM 1T1J structure

Phase Change RAM 1T1J structure

Comparisons
Density High (4) High(16) (ratio) (1) Dynamic Low Low for read; Medium for High for read; High Power Leakage High Low Low write for write Fast outperform Slow for Speed Fast Hybrid Cache could for Power read; read; Non-volatility No Yes Yes Slow for technology Very slow its counterpart of single Scalability Yes Yes write forYes write >1015 1016 1012 Endurance
PRAM assumes four bits per cell

SRA Low M

MRAM

PRAM

Reducedynamicmiss rate High leakage power Increase hit latency Low Cache power
33

Read/Write

Reads and writes Reads and writes have different performance/power implications Varied read/write behaviors for different benchmarks Emerging memories have different read/write features

Read-write aware Hybrid Cache Architecture (RWHCA) using NVM

RWHCA

Read-write aware Hybrid Cache Architecture (RWHCA) using Emergin NVM: Made of different memory technologies and distinguish reads and writes Increase effective cache size under similar area Reduce leakage power consumption Read/write exclusive regions in the same cache level Write region has faster write and low write power (SRAM) Intra-cache data movement policies Placing frequently written data to the write region Reduce power, may improve performance
35

Methodology
Chiplet Core w/ L1s L2 Write (SRAM) Core w/ L1s L2 Write (SRAM) L2 (SRAM) L2 Read (MRAM/ PRAM) L2 Read (MRAM)

Core w/ L1s

L3 (PRAM)

Baseline

RWHCA

3DRWHCA

Methodology

Cache parameters: CACTI or modified versions SRAM: 1MB, 8 cycles, 0.388 nJ, 1.36 W (45nm) MRAM: 4MB, 20/60 cycles, 0.4/2.3 nJ, 0.15W PRAM: 16MB, 40/200 cycles, 0.8/1.5 nJ, 0.3W System configuration Simulator: IBM Mambo Processor: 8-way issue, out-of-order, 4GHz L1: 32KB DL1,32KB IL1, 128B, 4-way, 1 r/w port, 2 cycles L2/L3: different for design cases Workloads 30 workloads from SPECINT2006, SPECJBB, NAS, BioPerf, PARSEC, SPLASH2 Various cache size requirements

RWHCA-result
SRAM/MRAM RWHCA L2 performance

5% geometric mean performance improvement over baseline 3% improvement over previous DNUCA policy DNUCA: move a line to a closer bank on each hit, no difference for reads and writes, other policies Also achieve better performance than 3-level SRAM cache 256KB L2 and 1MB L3, similar area

1.66

1.94

RWHCA-result
SRAM/MRAM RWHCA L2 power

55% power reduction over baseline dynamic power: normal + swap, less leakage power Lower power than DNUCA and 3-level SRAM

4 bars: SRAM baseline, DNUCA, RWHCA, 3-level SRAM

RWHCA-result
SRAM/PRAM RWHCA L2 performance

20% performance degradation over baseline PRAM is not suitable for L2 cache from the performance perspective due to its long write latency Low endurance, not suitable for lower level cache
1.42 1.44

Outline
Introduction Methodology Read-write

and Motivation

aware Hybrid Cache Architecture 3D Hybrid Cache stacking Conclusions

3DRWHCA-configuration

SRAM/MRAM/PRAM 3DRWHCA SRAM + MRAM L2 Total size: 4MB, 256KB SRAM Write region: SRAM, region region: MRAM SRAM r/w: 6 cycles, MRAM r: 20 cycles, w: 60 cycles Bank number: 16, Associativity: 16 Block size: 128B, 1 r/w port

L3 PRAM 32MB (core + L1 has similar area with L2) L3 bank number: 64, Associativity: 64 Block size: 128B, 1 r/w port Power: scale from RWHCA

3DRWHCA-result
3DRWHCA performance

16% geometric mean performance improvement over baseline 11% improvement over SRAM/MRAM RWHCA

1.94

2.2

1.88

1.71

3DRWHCA-result
3DRWHCA power

10% power reduction over baseline even with a PRAM L3 Higher power than RWHCA Lower power than 3-level SRAM

4 bars: SRAM baseline, RWHCA, 3DRWHCA, 3-level SRAM

Conclusion

Emerging NVM is getting mature Will this bring a new impact on computer architecture and system design?

Project Report Group1
100% (2)
Project Report Group1
91 pages
8.11 Given Six Memory Partitions of 300KB, 600KB, 350KB, 200KB
No ratings yet
8.11 Given Six Memory Partitions of 300KB, 600KB, 350KB, 200KB
20 pages
Xie
No ratings yet
Xie
31 pages
CAO Assignment
No ratings yet
CAO Assignment
33 pages
Static Random-Access Memory
No ratings yet
Static Random-Access Memory
9 pages
JNS - 35 - S1 - 231
No ratings yet
JNS - 35 - S1 - 231
16 pages
Assignment: Embedded Systems
No ratings yet
Assignment: Embedded Systems
6 pages
Activity No. 2 Computer Architecture ICS 222 (LAB) : Submitted by
No ratings yet
Activity No. 2 Computer Architecture ICS 222 (LAB) : Submitted by
4 pages
Random Access Memory
No ratings yet
Random Access Memory
8 pages
ApplicationSpecific DRAM Architectures and Designs
No ratings yet
ApplicationSpecific DRAM Architectures and Designs
81 pages
Static Random-Access Memory
No ratings yet
Static Random-Access Memory
6 pages
8086 Full Notes
No ratings yet
8086 Full Notes
50 pages
Static Random-Access Memory
No ratings yet
Static Random-Access Memory
9 pages
Computer Hardware Image Gallery
No ratings yet
Computer Hardware Image Gallery
7 pages
William Stallings Computer Organization and Architecture 7th Edition Internal Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Internal Memory
37 pages
How RAM Works
No ratings yet
How RAM Works
9 pages
Group Members: Karnail Katoch Vishal Mahant Archana Kahrinar Mohammad Pisavadi Kashyap Barad
No ratings yet
Group Members: Karnail Katoch Vishal Mahant Archana Kahrinar Mohammad Pisavadi Kashyap Barad
55 pages
Magnetic RAM: Magnetoresistive Random Access Memory
No ratings yet
Magnetic RAM: Magnetoresistive Random Access Memory
19 pages
Memory Unit Bindu Agarwalla
No ratings yet
Memory Unit Bindu Agarwalla
62 pages
Magnetic RAM (Ashley Jefferson)
No ratings yet
Magnetic RAM (Ashley Jefferson)
19 pages
5 - Internal Memory
100% (1)
5 - Internal Memory
22 pages
Semiconductor Memory Design
No ratings yet
Semiconductor Memory Design
27 pages
Embedded System Memory
No ratings yet
Embedded System Memory
22 pages
How RAM Works 28 by Nafees
No ratings yet
How RAM Works 28 by Nafees
10 pages
IRAM
No ratings yet
IRAM
32 pages
SRAM
No ratings yet
SRAM
36 pages
Types of RAM (Random Access Memory) : Information Technology
No ratings yet
Types of RAM (Random Access Memory) : Information Technology
31 pages
Dip Ali I Jcs It 2011020452
No ratings yet
Dip Ali I Jcs It 2011020452
9 pages
BY: For:: Ahmad Khairi Halis
No ratings yet
BY: For:: Ahmad Khairi Halis
19 pages
DRAM Is A Type of Random Access Memory That Stores Each Bit of Data in A Separate
No ratings yet
DRAM Is A Type of Random Access Memory That Stores Each Bit of Data in A Separate
6 pages
Memory Hierarchy Levels: Block (Aka Line) : Unit of Copying If Accessed Data Is Present in Upper Level
No ratings yet
Memory Hierarchy Levels: Block (Aka Line) : Unit of Copying If Accessed Data Is Present in Upper Level
16 pages
Comparch Fall2020 Lecture11a Memory Controllers
No ratings yet
Comparch Fall2020 Lecture11a Memory Controllers
71 pages
Chapter 1
No ratings yet
Chapter 1
6 pages
Memory
No ratings yet
Memory
87 pages
18 Ram
No ratings yet
18 Ram
21 pages
RAM Technologies
No ratings yet
RAM Technologies
25 pages
Chapter 9 Memory
No ratings yet
Chapter 9 Memory
28 pages
Sram 6T
No ratings yet
Sram 6T
5 pages
Week_3_Lesson_1 - S1-2025
No ratings yet
Week_3_Lesson_1 - S1-2025
64 pages
Unit IV The Memory System
No ratings yet
Unit IV The Memory System
78 pages
Seminar Report of Nano-Ram
75% (12)
Seminar Report of Nano-Ram
34 pages
EE6304 Lecture8 Mem Hierarchy
No ratings yet
EE6304 Lecture8 Mem Hierarchy
54 pages
Computer Funadamental Lab 3 by Rafae
No ratings yet
Computer Funadamental Lab 3 by Rafae
11 pages
Storage Class Memory (SCM)
No ratings yet
Storage Class Memory (SCM)
3 pages
Memory and Storage
No ratings yet
Memory and Storage
46 pages
T.Y. E.I. /2 / 1 Memories N.Kapoor
No ratings yet
T.Y. E.I. /2 / 1 Memories N.Kapoor
5 pages
Understanding RAM and Computer Memory Types notes 2025
No ratings yet
Understanding RAM and Computer Memory Types notes 2025
7 pages
6. Memory
No ratings yet
6. Memory
20 pages
Unit 6 Memory Organization
No ratings yet
Unit 6 Memory Organization
24 pages
RAM
No ratings yet
RAM
9 pages
Expt 8
No ratings yet
Expt 8
3 pages
Mram
No ratings yet
Mram
20 pages
BGC, HCHHC
No ratings yet
BGC, HCHHC
19 pages
Chapter 4 Memory Element
No ratings yet
Chapter 4 Memory Element
87 pages
Sram Memory Cell: By:-Udit Shah & Rajdeep Kandiyal
No ratings yet
Sram Memory Cell: By:-Udit Shah & Rajdeep Kandiyal
13 pages
Seminar
No ratings yet
Seminar
29 pages
Computer Science Ram Presentation
100% (3)
Computer Science Ram Presentation
11 pages
COA 10 Internal Memory-Nhan
No ratings yet
COA 10 Internal Memory-Nhan
15 pages
L14_ The Memory Hierarchy
No ratings yet
L14_ The Memory Hierarchy
41 pages
Memory Basics Explained
From Everand
Memory Basics Explained
Alisa Turing
No ratings yet
Memory Makers
From Everand
Memory Makers
Mei Gates
No ratings yet
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
1982 Hitachi Full Line Condensed Catalog
No ratings yet
1982 Hitachi Full Line Condensed Catalog
236 pages
Module2 2
No ratings yet
Module2 2
15 pages
Computer Abbreviations For All Competitive Exams
No ratings yet
Computer Abbreviations For All Competitive Exams
9 pages
Nama-Nama THL Per Ruangan 2016
No ratings yet
Nama-Nama THL Per Ruangan 2016
51 pages
The Components of The System Unit
No ratings yet
The Components of The System Unit
5 pages
Internal Hardware Devices: IT (9626) Theory Notes
No ratings yet
Internal Hardware Devices: IT (9626) Theory Notes
2 pages
Full Forms of Computer
0% (1)
Full Forms of Computer
8 pages
ICT Assignment - 1
No ratings yet
ICT Assignment - 1
8 pages
Schematic Manual 9-20-05
No ratings yet
Schematic Manual 9-20-05
66 pages
Sir Shan Notes
No ratings yet
Sir Shan Notes
54 pages
The Slice Is Served Enforcing Radio Access Network Slicing in Virtualized 5G Systems
No ratings yet
The Slice Is Served Enforcing Radio Access Network Slicing in Virtualized 5G Systems
10 pages
Direct Memory Access (DMA)
No ratings yet
Direct Memory Access (DMA)
4 pages
LAB Manual: Fundamentals of Computer
No ratings yet
LAB Manual: Fundamentals of Computer
17 pages
Chapter 01 Introduction
No ratings yet
Chapter 01 Introduction
17 pages
Unit-4 Memory Management
No ratings yet
Unit-4 Memory Management
56 pages
Computer Organization: Hierarchical Speed
No ratings yet
Computer Organization: Hierarchical Speed
25 pages
Mcs-012 Solved Assignment 2017-18
100% (6)
Mcs-012 Solved Assignment 2017-18
50 pages
Main Sol Midterm
No ratings yet
Main Sol Midterm
21 pages
TR10 1472
No ratings yet
TR10 1472
7 pages
15 Demand Paging, Thrashing, Working Sets
No ratings yet
15 Demand Paging, Thrashing, Working Sets
8 pages
6510 Microprocessor With I/O: Ommodore em Iconductor Roup
No ratings yet
6510 Microprocessor With I/O: Ommodore em Iconductor Roup
10 pages
Class Notes: Memory Technology and System-Level Memory Design
No ratings yet
Class Notes: Memory Technology and System-Level Memory Design
44 pages
Virtual Memory
No ratings yet
Virtual Memory
7 pages
Co Avl Slides
No ratings yet
Co Avl Slides
52 pages
Chapter 3 Hardware
No ratings yet
Chapter 3 Hardware
33 pages
Mock Test-03 NSSB Exam, Answers with Explanation (1)
No ratings yet
Mock Test-03 NSSB Exam, Answers with Explanation (1)
25 pages
Ldco CRB - 22-23 Sem I (Replica)
No ratings yet
Ldco CRB - 22-23 Sem I (Replica)
25 pages
Implementation of Cache Memory
No ratings yet
Implementation of Cache Memory
15 pages