More Performant CachingHostAllocator for Pinned Memory Allocation #106606
Labels
module: cuda
Related to torch.cuda, and CUDA support in general
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
🚀 The feature, motivation and pitch
I intend to employ a memory allocator for pinned memory allocation and have come across the
CachingHostAllocator
in PyTorch. Regrettably, the practical memory consumption surpasses what is expected. The allocator follows a power-of-two allocation strategy without memory coalescing, resulting in substantial memory wastage. This inefficiency can lead to increased memory consumption and suboptimal utilization of resources.The current implementation of the
CachingHostAllocator
in PyTorch for pinned memory allocation seems to exhibit suboptimal performance under certain conditions. As a crucial component in deep learning workloads, efficient memory allocation and management are essential to ensure optimal training and inference performance.Alternatives
I suggest the development of a more performant alternative to the existing
CachingHostAllocator
that addresses the performance concerns. This new allocator should focus on improving memory allocation speed, reducing memory fragmentation, and better leveraging modern hardware characteristics.Additional context
No response
Tasks
cc @ptrblck
The text was updated successfully, but these errors were encountered: