-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training fails if using multiple gpus #697
Comments
That's strange. Can you try removing |
Thanks for responding! It didn't work, I use pytorch = 1.0.0, cuda = 10.0, gpu Titan RTX on ubuntu 18.04.3. Also, I use this server simultaneously with my other uni colleagues ( if that is something that matters ) |
If it helps, here is the entire error message: |
I am assuming you also changed the dcn version to pytorch >= 1.0 (If not, please do so). Not sure if RTX works with cuda 10. I will suggest upgrading the cuda and pytorch version for RTX ... I am using torch 1.4, cuda 10.2, and RTX 2080, it works fine. |
You can try to reduce “--num_workers” |
I want to train coco_dla_2x with 8 gpus with the following command:
python main.py ctdet --exp_id coco_dla_2x --batch_size 8 --master_batch 1 --lr 5e-4 --gpus 0,1,2,3,4,5,6,7 --num_workers 8 --num_epochs 230
and I get CUDA_OUT_OF_MEMORY. However if I use a single gpu, it works just fine (using the same command). I also mention that all the gpus are the same.
The text was updated successfully, but these errors were encountered: