CXL Shared Memory Programming: Barely Distributed and Almost Persistent

Xu, Yi; Mahar, Suyash; Liu, Ziheng; Shen, Mingyao; Swanson, Steven

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2405.19626 (cs)

[Submitted on 30 May 2024 (v1), last revised 17 Jul 2024 (this version, v2)]

Title:CXL Shared Memory Programming: Barely Distributed and Almost Persistent

Authors:Yi Xu, Suyash Mahar, Ziheng Liu, Mingyao Shen, Steven Swanson

View PDF HTML (experimental)

Abstract:While Compute Express Link (CXL) enables support for cache-coherent shared memory among multiple nodes, it also introduces new types of failures--processes can fail before data does, or data might fail before a process does. The lack of a failure model for CXL-based shared memory makes it challenging to understand and mitigate these failures.
To solve these challenges, in this paper, we describe a model categorizing and handling the CXL-based shared memory's failures: data and process failures. Data failures in CXL-based shared memory render data inaccessible or inconsistent for a currently running application. We argue that such failures are unlike data failures in distributed storage systems and require CXL-specific handling. To address this, we look into traditional data failure mitigation techniques like erasure coding and replication and propose new solutions to better handle data failures in CXL-based shared memory systems. Next, we look into process failures and compare the failures and potential solutions with PMEM's failure model and programming solutions. We argue that although PMEM shares some of CXL's characteristics, it does not fully address CXL's volatile nature and low access latencies. Finally, taking inspiration from PMEM programming solutions, we propose techniques to handle these new failures.
Thus, this paper is the first work to define the CXL-based shared memory failure model and propose tailored solutions that address challenges specific to CXL-based systems.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2405.19626 [cs.DC]
	(or arXiv:2405.19626v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2405.19626

Submission history

From: Suyash Mahar [view email]
[v1] Thu, 30 May 2024 02:23:50 UTC (166 KB)
[v2] Wed, 17 Jul 2024 03:02:49 UTC (166 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:CXL Shared Memory Programming: Barely Distributed and Almost Persistent

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:CXL Shared Memory Programming: Barely Distributed and Almost Persistent

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators