PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk #20630

copybara-service · 2024-12-17T14:39:54Z

PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk

Imported from GitHub PR #20086

Move pointer initialization to the thunk init stage instead of runtime to get rid of the runtime blocking wait.
Add a device sync point using nccl allreduce before doing memcpy to make sure all gpus arrive at the same stage. Otherwise it's possible to have data corruptions when the receiving rank hasn't arrived at the memcpy.
Copybara import of the project:

--
ba4ad04 by TJ Xu [email protected]:

Moved pointer init to thunk init stage and add a sync point before doing
memcpy to make sure data consistency across ranks

--
050bc59 by TJ Xu [email protected]:

Added e2e test for mem cpy p2p in a loop

--
1f75328 by TJ Xu [email protected]:

Added return status for cleanup functions

Merging this change closes #20086

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20086 from Tixxx:tixxx/memcpy_p2p_fix 1f75328

Imported from GitHub PR #20086 Move pointer initialization to the thunk init stage instead of runtime to get rid of the runtime blocking wait. Add a device sync point using nccl allreduce before doing memcpy to make sure all gpus arrive at the same stage. Otherwise it's possible to have data corruptions when the receiving rank hasn't arrived at the memcpy. Copybara import of the project: -- ba4ad04 by TJ Xu <[email protected]>: Moved pointer init to thunk init stage and add a sync point before doing memcpy to make sure data consistency across ranks -- 050bc59 by TJ Xu <[email protected]>: Added e2e test for mem cpy p2p in a loop -- 1f75328 by TJ Xu <[email protected]>: Added return status for cleanup functions Merging this change closes #20086 COPYBARA_INTEGRATE_REVIEW=#20086 from Tixxx:tixxx/memcpy_p2p_fix 1f75328 PiperOrigin-RevId: 707145351

copybara-service bot force-pushed the test_707074350 branch from 92ade0f to 9366198 Compare December 17, 2024 17:29

copybara-service bot force-pushed the test_707074350 branch from 9366198 to 85af7f1 Compare December 17, 2024 18:08

copybara-service bot merged commit 85af7f1 into main Dec 17, 2024

copybara-service bot deleted the test_707074350 branch December 17, 2024 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk #20630

PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk #20630

copybara-service bot commented Dec 17, 2024

PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk #20630

PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk #20630

Conversation

copybara-service bot commented Dec 17, 2024