RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Ghiasi, Nika Mansouri; Sadrosadati, Mohammad; Oliveira, Geraldo F.; Kanellopoulos, Konstantinos; Ausavarungnirun, Rachata; Luna, Juan Gómez; Manglik, Aditya; Ferreira, João; Kim, Jeremie S.; Giannoula, Christina; Vijaykumar, Nandita; Park, Jisung; Mutlu, Onur

Computer Science > Hardware Architecture

arXiv:2210.08508 (cs)

[Submitted on 16 Oct 2022]

Title:RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Authors:Nika Mansouri Ghiasi, Mohammad Sadrosadati, Geraldo F. Oliveira, Konstantinos Kanellopoulos, Rachata Ausavarungnirun, Juan Gómez Luna, Aditya Manglik, João Ferreira, Jeremie S. Kim, Christina Giannoula, Nandita Vijaykumar, Jisung Park, Onur Mutlu

View PDF

Abstract:Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift from the main memory to the core and cache hierarchy. Hence, there is a need to revisit current core and cache designs that have been conventionally tailored to tackle the memory bottleneck.
Our goal is to redesign the core and cache hierarchy, given the fundamentally new trade-offs of M3D, to benefit a wide range of workloads. To this end, we take two steps. First, we perform a design space exploration of the cache and core's key components. We highlight that in M3D systems, (i) removing the shared last-level cache leads to similar or larger performance benefits than increasing its size or reducing its latency; (ii) improving L1 latency has a large impact on improving performance; (iii) wider pipelines are increasingly beneficial; (iv) the performance impact of branch speculation and pipeline frontend increases; (v) the current synchronization schemes limit parallel speedup. Second, we propose an optimized M3D system, RevaMp3D, where (i) using the tight connectivity between logic layers, we efficiently increase pipeline width, reduce L1 latency, and enable fine-grained synchronization; (ii) using the high-bandwidth and energy-efficient main memory, we alleviate the amplified energy and speculation bottlenecks by memoizing the repetitive fetched, decoded, and reordered instructions and turning off the relevant parts of the core pipeline when possible. RevaMp3D provides, on average, 81% speedup, 35% energy reduction, and 12.3% smaller area compared to the baseline M3D system.

Subjects:	Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2210.08508 [cs.AR]
	(or arXiv:2210.08508v1 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2210.08508

Submission history

From: Nika Mansouri Ghiasi [view email]
[v1] Sun, 16 Oct 2022 11:21:26 UTC (402 KB)

Computer Science > Hardware Architecture

Title:RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators