You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In IsaacGymEnvs, rl-games + multiGPU seems to have some issues. As shown in the screenshot, rl-games + multiGPU performs uses twice amount of data and performs worse than the single GPU setting in Ant
This issue tracks the investigation of this issue.
We need to seed multiGPU processes with different seeds to decorrelate experience, otherwise the multiGPU processes will produce the exact observations.
Configuration-wise we can set the overall seed with params.seed and env seed with params.config.env_config.seed, so if params.config.env_config.seed is set but params.seed is not set, we get identical observations from the environments as shown below:
This is probably ok since the agent still samples different actions, but it's nonetheless a problem. The correct implementation is to use seed = seed + local_rank.
After fixing #163, I was able to match the sample efficiency in the single GPU setting:
However, the wall time is slower than I had expected. On a separate benchmark I made with CleanRL, the experiments show horovod should make Ant step 20% faster.
Maybe it's the averaging stats overhead? In the CleanRL benchmark experiments I did not mess with stats at all.
The text was updated successfully, but these errors were encountered:
In IsaacGymEnvs, rl-games + multiGPU seems to have some issues. As shown in the screenshot, rl-games + multiGPU performs uses twice amount of data and performs worse than the single GPU setting in
Ant
This issue tracks the investigation of this issue.
Proposed debugging route
I suggest making sure we make sure there is no loss in sample efficiency first before scaling to more envs by matching implementation details in our prototype in CleanRL: https://cleanrl-git-new-multi-gpu-vwxyzjn.vercel.app/rl-algorithms/ppo/#implementation-details_6.
Identified issues:
1. Seeding logic and configuration issue
We need to seed multiGPU processes with different seeds to decorrelate experience, otherwise the multiGPU processes will produce the exact observations.
Configuration-wise we can set the overall seed with
params.seed
and env seed withparams.config.env_config.seed
, so ifparams.config.env_config.seed
is set butparams.seed
is not set, we get identical observations from the environments as shown below:This is probably ok since the agent still samples different actions, but it's nonetheless a problem. The correct implementation is to use
seed = seed + local_rank
.2. stepping logic issue
After fixing #163, I was able to match the sample efficiency in the single GPU setting:
However, the wall time is slower than I had expected. On a separate benchmark I made with CleanRL, the experiments show horovod should make Ant step 20% faster.
Maybe it's the averaging stats overhead? In the CleanRL benchmark experiments I did not mess with stats at all.
The text was updated successfully, but these errors were encountered: