Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk #20630

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

copybara-service[bot]
Copy link

PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk

Imported from GitHub PR #20086

Move pointer initialization to the thunk init stage instead of runtime to get rid of the runtime blocking wait.
Add a device sync point using nccl allreduce before doing memcpy to make sure all gpus arrive at the same stage. Otherwise it's possible to have data corruptions when the receiving rank hasn't arrived at the memcpy.
Copybara import of the project:

--
ba4ad04 by TJ Xu [email protected]:

Moved pointer init to thunk init stage and add a sync point before doing
memcpy to make sure data consistency across ranks

--
050bc59 by TJ Xu [email protected]:

Added e2e test for mem cpy p2p in a loop

--
1f75328 by TJ Xu [email protected]:

Added return status for cleanup functions

Merging this change closes #20086

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20086 from Tixxx:tixxx/memcpy_p2p_fix 1f75328

Imported from GitHub PR #20086

Move pointer initialization to the thunk init stage instead of runtime to get rid of the runtime blocking wait.
Add a device sync point using nccl allreduce before doing memcpy to make sure all gpus arrive at the same stage. Otherwise it's possible to have data corruptions when the receiving rank hasn't arrived at the memcpy.
Copybara import of the project:

--
ba4ad04 by TJ Xu <[email protected]>:

Moved pointer init to thunk init stage and add a sync point before doing
memcpy to make sure data consistency across ranks

--
050bc59 by TJ Xu <[email protected]>:

Added e2e test for mem cpy p2p in a loop

--
1f75328 by TJ Xu <[email protected]>:

Added return status for cleanup functions

Merging this change closes #20086

COPYBARA_INTEGRATE_REVIEW=#20086 from Tixxx:tixxx/memcpy_p2p_fix 1f75328
PiperOrigin-RevId: 707145351
@copybara-service copybara-service bot merged commit 85af7f1 into main Dec 17, 2024
@copybara-service copybara-service bot deleted the test_707074350 branch December 17, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant