Training fails if using multiple gpus #697

cocacola0 · 2020-04-30T15:24:05Z

I want to train coco_dla_2x with 8 gpus with the following command:
python main.py ctdet --exp_id coco_dla_2x --batch_size 8 --master_batch 1 --lr 5e-4 --gpus 0,1,2,3,4,5,6,7 --num_workers 8 --num_epochs 230
and I get CUDA_OUT_OF_MEMORY. However if I use a single gpu, it works just fine (using the same command). I also mention that all the gpus are the same.

xingyizhou · 2020-04-30T19:40:00Z

That's strange. Can you try removing --master_batch? If that doesn't work, can you specify your cuda/ pytorch version and GPU type?

cocacola0 · 2020-04-30T20:36:54Z

Thanks for responding! It didn't work, I use pytorch = 1.0.0, cuda = 10.0, gpu Titan RTX on ubuntu 18.04.3. Also, I use this server simultaneously with my other uni colleagues ( if that is something that matters )

cocacola0 · 2020-05-04T11:57:53Z

If it helps, here is the entire error message:
(base) adriantura@tmas395x:~/CenterNetDoneUntilEaster/CenterNet/src$ python main.py ctdet --exp_id coco_dla_2x --batch_size 8 --lr 5e-4 --gpus 0,1,2,3,4,5,6,7 --num_workers 16 --num_epochs 230
Fix size testing.
training chunk_sizes: [1, 1, 1, 1, 1, 1, 1, 1]
The output will be saved to /home/adriantura/CenterNetDoneUntilEaster/CenterNet/src/lib/../../exp/ctdet/coco_dla_2x
heads {'hm': 80, 'wh': 2, 'reg': 2}
Creating model...
Setting up data...
==> initializing coco 2017 val data.
loading annotations into memory...
Done (t=0.60s)
creating index...
index created!
Loaded val 5000 samples
==> initializing coco 2017 train data.
loading annotations into memory...
Done (t=15.95s)
creating index...
index created!
Loaded train 118287 samples
Starting training...
ctdet/coco_dla_2xTraceback (most recent call last):
File "main.py", line 106, in
main(opt)
File "main.py", line 72, in main
log_dict_train, _ = trainer.train(epoch, train_loader)
File "/home/adriantura/CenterNetDoneUntilEaster/CenterNet/src/lib/trains/base_trainer.py", line 119, in train
return self.run_epoch('train', epoch, data_loader)
File "/home/adriantura/CenterNetDoneUntilEaster/CenterNet/src/lib/trains/base_trainer.py", line 69, in run_epoch
output, loss, loss_stats = model_with_loss(batch)
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 139, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 35, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
return scatter_map(inputs)
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 19, in scatter_map
return list(map(type(obj), zip(map(scatter_map, obj.items()))))
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
return Scatter.apply(target_gpus, None, dim, obj)
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/home/adriantura/anaconda3/lib/python3.7/site-packages/torch/cuda/comm.py", line 148, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error: out of memory (malloc at /opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/THC/THCCachingAllocator.cpp:205)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f6e9dc03cc5 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x135af8f (0x7f6ea16f4f8f in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: + 0x135b79a (0x7f6ea16f579a in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: at::native::empty_cuda(c10::ArrayRef, at::TensorOptions const&) + 0x2d6 (0x7f6ea2d5c1c6 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::CUDAFloatType::empty(c10::ArrayRef, at::TensorOptions const&) const + 0x161 (0x7f6ea1606931 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #5: torch::autograd::VariableType::empty(c10::ArrayRef, at::TensorOptions const&) const + 0x179 (0x7f6e9b0a6bc9 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #6: at::TypeDefault::copy(at::Tensor const&, bool, c10::optionalc10::Device) const + 0x122 (0x7f6e9e5dede2 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #7: + 0x5fa057 (0x7f6e9e40f057 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #8: at::native::to(at::Tensor const&, at::TensorOptions const&, bool, bool) + 0x295 (0x7f6e9e410cd5 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #9: at::TypeDefault::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x17 (0x7f6e9e5a4d27 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #10: torch::autograd::VariableType::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x17a (0x7f6e9b04cb2a in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #11: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef, c10::optional<std::vector<long, std::allocator > > const&, long, c10::optional<std::vector<c10::optionalat::cuda::CUDAStream, std::allocator<c10::optionalat::cuda::CUDAStream > > > const&) + 0x491 (0x7f6ede76d161 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: + 0x4fae71 (0x7f6ede772e71 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #13: + 0x112176 (0x7f6ede38a176 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #21: THPFunction_apply(_object, _object) + 0x5a1 (0x7f6ede585bf1 in /home/adriantura/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

xingyizhou · 2020-05-04T15:06:52Z

I am assuming you also changed the dcn version to pytorch >= 1.0 (If not, please do so). Not sure if RTX works with cuda 10. I will suggest upgrading the cuda and pytorch version for RTX ... I am using torch 1.4, cuda 10.2, and RTX 2080, it works fine.

QihuaCheng · 2020-05-21T12:15:03Z

You can try to reduce “--num_workers”

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training fails if using multiple gpus #697

Training fails if using multiple gpus #697

cocacola0 commented Apr 30, 2020

xingyizhou commented Apr 30, 2020

cocacola0 commented Apr 30, 2020

cocacola0 commented May 4, 2020

xingyizhou commented May 4, 2020

QihuaCheng commented May 21, 2020

Training fails if using multiple gpus #697

Training fails if using multiple gpus #697

Comments

cocacola0 commented Apr 30, 2020

xingyizhou commented Apr 30, 2020

cocacola0 commented Apr 30, 2020

cocacola0 commented May 4, 2020

xingyizhou commented May 4, 2020

QihuaCheng commented May 21, 2020